Metodološki zvezki, Vol. 4, No. 1, 2007, 37-70 Estimating Poverty in the Italian Provinces using Small Area Estimation Models Claudio Quintano, Rosalia Castellano, and Gennaro Punzo1 Abstract Sample survey data are broadly used to provide direct estimates of poverty for the whole population and large areas or domains. That is one of the main deficiencies of poverty analysis at a sub-national level (i.e., related either to regions, or provinces). As they are considered very small geographical areas, since the domain-specific sample is not large enough to support direct estimates of adequate precision, they are likely to produce large standard errors, due to the unduly small size of the sample in that area (Ghosh & Rao, 1994). The aim of our paper is to improve the estimation process quality, in terms of efficiency, of some poverty measures for Italian provinces (NUTS3). The adopted approach deals with Area Level Random Effect Model (Fay & Herriot, 1979) which relates small area direct estimators to domain specific covariates, considering the random area effects as independent. Under that model, the Empirical Best Linear Unbiased Predictor (EBLUP) is obtained. We extend the analysis beyond the conventional measures of income poverty that simply dichotomise the population into the “poor” and the “non poor” by a threshold value and we also consider a fuzzy monetary measure treating poverty as a matter of degree (Cheli & Lemmi, 1995; Cheli, 1995). Through such an analysis, we determine some of the socio-economic factors contributing to poverty levels and living standards, and we investigate in depth the territorial perspective. In order to evaluate the performance of the estimation process through small area models and, consequently, the contribution of auxiliary information to composite poverty estimates, we have defined some outcome measures and some quality indicators (Rao, 2003) have been computed. They allow us to test the extent to which the modelling modifies the input direct estimates and the degree of improvement in the accuracy level of the estimates provided by modelling and, more generally, to evaluate the performance of small area estimators. 1 Department of Statistics and Mathematics for Economic Research, University of Naples “Parthenope”, Via Medina 40, 80133 Naples, Italy; quintcla@hotmail.com, lia.castellano@uniparthenope.it, gennaro.punzo@uniparthenope.it This paper was supported by the 2006 Endowment Funds of the Department of Statistics and Mathematics for Economic Research of University of Naples “Parthenope” in the framework of the researches “Income Inequality” and “Data Quality”. This work is coordinated by C. Quintano and it is the result of the common work of the authors: R. Castellano is the author of Sections 1 and 9; G. Punzo is the author of Sections 2, 3, 4, 5, 6, 7 and 8. 38 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo 1 Background and introduction Poverty is certainly an intuitive phenomenon but its nature is so intricate and heterogeneous that it is very difficult to obtain an objective and unambiguous statistical definition. In fact, any integral approach to the measurement of poverty and living standards is faced with the problem of the lack of a universal measurement yardstick. At the country level, data on poverty are essential for research, policy formulation and implementation2. Primarily, they come from sample surveys but it is important to note, from the outset, that there are many facets of poverty that data are not able to capture. In fact, different types of data gaps are likely to be encountered during the formulation of poverty profiles and assessments. Moreover, poverty is often considered as a derived measure or latent variable because it is not directly observed on households or individuals; surveys ask questions about some of their features that can be used to evaluate poverty status. One of the main criterions usually used to determine sample size of nationwide surveys consists of yielding a specified level of precision for a given domain; generally, the domain used is the national territory or, at least, large geographical areas of the country. In Italy, for example, the source of official statistics on poverty, provided by the National Statistics Institute (Istat), is the Household Budget Survey (HBS). It is planned to produce reliable poverty estimates at a national level or, at least, for large geographical divisions (Northern, Central and Southern Italy). As a matter of fact, the main problem in the production of poverty estimates at a higher degree of territorial disaggregation is the small size of the sample available at a sub-national level (i.e., of the regions, or even at the level of smaller units, which in Italy are the provinces). Their variability is high due to the effect of the sampling error which increases with the decreasing size of the sub-samples in the areas3. 2 The awareness of the existence of poverty in western societies has been increasing during the last years. Poverty is not a problem regarding only under-developed or developing countries but it also concerns developed societies; we may meet “the poor” in advanced economies, too. Social attitudes towards poverty are changing. In many western economies, where a high level of affluence is obtained, poverty can be eliminated without causing any significant hardship to the “non poor” but it is highly probable that they will continue to need outside assistance to eliminate poverty or, at least, to reduce its intensity. The prior problem is to identify “the poor” and to measure the intensity of their deprivation so that methods can be devised to wage a war against it. 3 The HBS, carried out by Istat since 1968, is a sample survey whose main objective is to collect information on the consumption patterns of private households in order to provide quarterly estimates of this aggregate. It is a repeated monthly cross-section and each household being interviewed only once. In order to improve the reliability of the direct poverty estimates at a sub-national level, Istat has adopted several sampling strategies such as an increase of the HBS sample size and the introduction of a new set of questions about living conditions; nevertheless, the larger sample size Estimating Poverty in the Italian Provinces… 39 Sample size problems require a more sophisticated statistical approach rather than the simple use of direct estimators. Special estimators, that “borrow strength” from related areas across space and/or time or through auxiliary information which is supposed to be correlated to the variable of interest, can be constructed (Rao, 2003). In literature, that new class of estimators is classified as Small Area Estimators (SAE Models). The aim of our paper is to improve the estimation process quality, in terms of efficiency, of some income poverty measures for Italian provinces (NUTS3)4. Currently, in Italy, direct poverty estimates at a provincial level are not produced for the previously mentioned reasons. However, we define direct poverty estimates on ECHP (European Community Household Panel) data and then we aim to improve them using SAE techniques. In fact, in order to design policies and to monitor the poverty situation, area-specific indicators are required. Poverty and inequality measures are most useful to policy-makers and researchers when they are finely disaggregated. The adopted approach deals with Area Level Random Effect Models (Fay & Herriot, 1979) because auxiliary variables are available at an area level only. Those models relate small area direct estimators to domain specific covariates, considering the random area effects as independent. Under that model, the Empirical Best Linear Unbiased Predictor (EBLUP) is obtained5. Moreover, the work aims at exploring the territorial contribution for the poverty estimates in order to determine some of the socio-economic factors involved some disadvantages in terms of cost and timeliness. In order to overcome financial, organizational and methodological problems deriving from the increase of the sample size and in order to produce reliable poverty estimates at a sub-national level, new methodologies are being worked out (Falorsi et al., 2003). 4 The Nomenclature of Territorial Units for Statistics (NUTS) is a territorial system of classification worked out by Eurostat. Observing the geographical-administrative divisions of the European Union Member States, the NUTS system provides a hierarchical, exhaustive and non-overlapping set of units. It proceeds step-by-step from higher units (NUTS1 level) to lower ones (NUTS2 and NUTS3 levels), increasing the degree of disaggregation of the statistical indicators. In Italy, there are 11 areas at NUTS1 level corresponding to the main socio-economic macro-regions; 20 areas at NUTS2 level corresponding to classical Italian administrative regions; 103 areas at NUTS3 level corresponding to Italian provinces. The NUTS system provides an important framework for the comparability inter and/or intra-country. Frequently, a provincial territory shows a high degree of heterogeneity as it often includes large and small municipalities, cities and countries, plains and mountains. Therefore, it can be considered a partition, in all respects, of national territory and it is a right term for a comparison with the national values. 5 Since our analysis is restricted to Italian context, where most structural variables are largely homogenous on national territory, we have preferred to adopt an absolute approach to estimate small area models, instead of a hierarchical (or ratio) one. According to the ratio approach, all target variables and all covariates are expressed in the form of the ratio Rijk=Yijk/Yij, where (Yijk, Yij) refer to the actual values of the variables, respectively, for the province k and its region j belonging to NUTS1 i. In such a way, the difficulty to quantify institutional and historical factors is abstracted. The ratio approach is particularly helpful for comparative analysis at an international level where it is important to take into account that there are substantial differences among countries, with regard to several factors, both political and economical, and the structure of the most important social systems (such as fiscal, education, labour, etc.). 40 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo contributing to poverty levels and living standards and the territorial perspective is also investigated in depth. So, on the one hand, we test a lot of possible covariates and then we select the ones with a higher level of correlation with poverty measure to estimate according to a stepwise procedure and, on the other hand, we introduce a qualitative variable reflecting the geographical localisation of different Italian provinces in the small area models. Finally, we extend the analysis beyond the conventional measures of income poverty and we also consider a fuzzy monetary measure treating poverty as a matter of degree (Cheli & Lemmi, 1995; Cheli, 1995). As a matter of fact, by comparing and contrasting the conventional and fuzzy income poverty measures, the paper discusses the differentials in the level and intensity of income poverty across Italian provinces. That allows us to identify not only the individuals but also the areas which, more than others, need structural interventions. Small area estimation models are especially useful for predictions in provinces where there is an absence of survey data for direct estimates. The procedure that can be followed is to use the regression coefficients determined by the corresponding EBLUP model to predict the dependent variables (poverty measures) on the basis of selected predictors provided by Istat data base. The article is organised as follows. In section 2, we briefly describe the Fay-Herriot model adopted in our analysis. Section 3 deals with data sources for direct and synthetic estimates, respectively, represented, by ECHP survey and by the data base of Territorial Indicators of the National Statistics Institute (Istat). Through a mixture of the two data sources, we obtained composite estimates with advanced levels of efficiency compared to the correspondent direct estimates. Sections 4 and 5 illustrate, respectively, monetary poverty measures, constituting target variables of the adopted models, and the evaluation process of the sampling errors. The stepwise procedure for the selection of covariates is depicted in Section 6. Sections 7 and 8 are the main sections of the paper; they show the most important empirical results of our analysis and, in order to evaluate the performance of small area estimators, some outcome measures and quality indicators have been computed. Concluding remarks can be found in Section 9 where we also give an insight on some further developments. 2 Area Level Random Effect Models and EBLUP estimators: theoretical and methodological view As a rule, a domain is regarded as small if the domain-specific sample is not large enough to support direct estimates of adequate precision; they are likely to produce large standard errors due to the unduly small size of the sample in the area (Ghosh 6 Rao, 1994). Two kinds of small area estimation methods can be identified: model assisted, if the indirect estimators are based on implicit models, including Estimating Poverty in the Italian Provinces… 41 synthetic and composite estimators, and model based, if the indirect estimators are based on explicit models that incorporate area-specific effects. Furthermore, the model based methods can be classified as Area Level Random Effect Models (Fay & Herriot, 1979), used when auxiliary information is available only at area level, and Nested Error Unit Level Regression Model (Battese et al., 1988), when specific covariates are available at unit level. As illustrated above, our study deals with Area Level Random Effect Models; they relate small area direct estimators to domain specific covariates and consider the random area effects as independent. Let X be the (m x p) matrix of the area specific covariates (auxiliary variables) related to the target parameters (poverty measures) where m is the number of small areas (provinces) and p is the number of covariates of the model. It is possible, referring to the area ith, to express the following model: q = x b+ z v with i =1,2,..., m i iT ii (2.1) where qi is the target parameter referring to the area ith, b is the (p x 1) vector of regression parameters and vi are independently and identically distributed random variables with zero mean and known variance s2v. Moreover, it is assumed that the direct or design-based estimators qˆi are available: qˆi =qi + ei with i = 1,2,..., m (2.2) where ei are independent sampling errors with zero mean and known variance (fi). The two latter conditions imply that the estimators qˆi are design-unbiased. Combining the above two equations, the following linear mixed model was obtained (Fay & Herriot, 1979): qˆ = xiT b+ zi vi + ei with i = 1,2,..., m (2.3) It involves model errors vi as well as design-induced errors ei; it assumes vi and ei are independent and their (m x m) covariance matrices have a block diagonal structure. Under the basic area level model, the best estimator of qi , in the sense of minimizing the MSE, is given by: ~ ˆˆ qi (sv2 ) = giqi + (1-gi )xiT b (2.4) 42 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo where gi , defined shrinkage factor, is a measure of between area variability relative to the total variability associated with area i (Rao, 2005). This parameter, which assumes values in the range [0,1], measures the uncertainty in modelling the qi . It is the weight given to the direct estimate in the final composite estimate. It is: gi = zi2sv2 zi2sv2 +yi (2.5) This estimator is also the Best Linear Unbiased Predictor (BLUP) estimator and it is a weighted average of the direct estimator, qi , and the regression-x bˆ synthetic estimator, iT . Prasad & Rao (1990) give a measure of the mean square error of the BLUP estimator. It depends on the unknown variance parameter sv2 and it is: ~ MSE [qi (sv2 )] = g1i (sv2 ) + g2 i (sv2 ) (2.6) with and g1i (sv2 )=sv2 zi2yi (sv2 zi2 +yi )-1 =giyi g2i (sv2 ) = (1-gi )2 xi ? xiT xi i=1 (sv2 zi2 +yi ) -1 xiT (2.7) (2.8) where the second term g 2i (sv2 ) is due to estimating ß (Rao, 2003). In practice, the variance parameter sv2 is replaced by a suitable estimator sˆv2 ; ~ 2 a two stage estimator q(sˆ v ) is obtained and it is called Empirical BLUP (EBLUP). The EBLUP estimator is unbiased for ? if E[q~ (sˆ v2 )] is finite and sˆv2 is any translation invariant estimator of sv2 (Kackar & Harville, 1984). Assuming normality, the variance of the random effects can be estimated either by Maximum Likelihood (ML) or Restricted Maximum Likelihood (REML) Estimating Poverty in the Italian Provinces… 43 methods. The MSE of the EBLUP appears to be insensitive to the choice of the estimator sˆv2 . Under normality of the random effects: ~ ~~~ MSE [qi (sˆv2 )] = MSE [qi (sv2 )] + E[qi (sˆv2 ) -qi (sv2 )]2 (2.9) where the last term, generally intractable, is obtained as an approximation. The approximated form of the mean square error is given by: ~ MSE [qi (sˆv2 )] » g1i (sv2 ) + g2 i (sv2 ) + g3i (sv2 ) (2.10) g(s )g(s ) g(s ) where 2i v2 and 3i v2 are of lower order than the term 1i v2 . In our analysis, sˆv2 is estimated by Restricted Maximum Likelihood (REML); so, an approximately unbiased estimator of this mean square error has been computed using the following expression: ~ mse [qi (sˆv2 )] » g1i (sˆv2 ) + g2 i (sˆv2 ) + 2g3i (sˆv2 ) (2.11) 3 Data sources for direct and synthetic estimates At the current state, provincial-specific knowledge available seems to preclude the use of unit-level models because the covariates are available at area level only; in fact, auxiliary variables are usually aggregated at area level. In our paper, three essential aspects have to be considered. Firstly, making the best use of existing sample survey data, for example by pooling them to construct more robust measures at a higher degree of spatial disaggregation. Secondly, exploiting the accessible data source for territorial indicators. Thirdly, combining the two sources to produce, if possible, the most complete estimates for the Italian provinces, by using small area estimation techniques. Sample data for direct poverty estimates at NUTS3 level come from ECHP survey while auxiliary information for synthetic poverty estimates come from Istat data base. Composite poverty estimates come from a linear mixture of the two previous estimates; their efficiency depends on the specific situation and on the nature of the statistical data available. 3.1 ECHP and the pooling over waves ECHP, the European Community Household Panel, has traditionally been the primary source of the data used by Eurostat and the National Statistics Institutes 44 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo for the construction of many indicators in the field of income, poverty and social exclusion. Launched on a gentleman’s agreement basis, it is designed to be a multidimensional sample survey. Indeed, during the period 1994-2001, it offered, in several European Union Member States, the only information source with a large range of topics and it also provided detailed information on the economic activities of individuals and on the single components of net monetary income at the household and individual levels over the whole of the previous calendar year. The ECHP survey design is based on a pure longitudinal panel; it means that the sample selected for the first year of the survey was followed-up throughout the subsequent duration of the survey (8 years), wherever the sample units may have moved. In terms of completed interviews, the total community sample size for the twelve countries in 1994 was slightly over 60.000 households and approximately 127.000 adults aged 16 years and over; similarly, the initial Italian sample amounted to 7.115 respondent households. Furthermore, spatial comparability is achieved in ECHP through a standardised design and common technical and implementation procedures with coordination of the national surveys by Eurostat. It is an input harmonization because all aspects of the production process are defined and implemented in a uniform way all over the countries with a strong economy6. ECHP data are not normally used as a basis for sub-national estimates on account of the small size of the sample. In our research, we adopt an estimation procedure of direct poverty measures at a provincial level obtained by combining over all suitable ECHP waves (1994-2001) (Kish, 1990, 1999; Verma, 2002). Such a technique allows to reduce the variability of direct poverty estimates by means of a wider sample. In other words, measures constructed from pooled data tend to be more robust than the results based on one single wave (European Commission, 2005)7. Marker (2001), for example, studied the level of accuracy for state estimates by combining the 1995 United States National Health Interview Survey (NHIS) sample with the previous year sample or the previous two years samples. 6 On the other hand, in output harmonization, the activities are limited to standardise the results of the statistical production process. Two significant examples concern the Cross-National Equivalent File (CNEF) and the Luxembourg Income Study (LIS). The former is a longitudinal micro-database administered at Cornell University which provides comparably defined variables for the use in a cross-national research; it includes four national panel surveys: the British Household Panel Study (BHPS), the German Socio-Economic Panel (GSOEP), the Canadian Survey of Labour and Income Dynamics (SLID) and the United States Panel Study of Income Dynamics (PSID). The latter are data records providing harmonized cross-national household income micro-data for social science research for over fifteen years. 7 A similar approach was used for a research of the Bank of Italy by Cannari & D’Alessio (2003), beginning from SHIW data. SHIW is the acronym of Survey of Households’ Income and Wealth, a biennial split panel sample survey carried out by Bank of Italy. Its sample size is so small that it doesn’t allow to achieve reliable estimate of income and wealth at NUTS2 level. The study presents an experimental estimation of these regional aggregates for the period 1995-2000, obtained by combining three SHIW waves. The pooling concerns a substantially homogenous period on macro-economic viewpoint. Estimating Poverty in the Italian Provinces… 45 He showed that aggregation helps to improve considerably the estimates of some selected variables. Of course, in a pure panel survey, like the ECHP, or split panel, like the Italian Survey of Households’ Income and Wealth (SHIW, Bank of Italy), the core of the sample is a set of the same individuals so that the data from different waves are correlated. So, we can not merely add up the sample size over waves; in the estimation procedure, we have to take in account the correlation in the poverty measures over time in order to evaluate its impact in terms of variability of the estimates. As a matter of fact, “in presence of correlation amongst phenomena observed through contiguous waves, the estimator considering the correlation is more efficient than the estimator where that correlation is not considered” (Cannari & D’Alessio, 2003)8. Therefore, we test the level of positive correlation in the poverty measures over waves as well as the degree of decrease of the gain from cumulation. For example, aggregating two adjacent ECHP waves, with a proportion of poor p and p’, respectively, the corresponding variance directly depends on the correlation in the poverty measures between two periods. This correlation is expected to decline as the two waves become more widely separated. However, by combining consecutive annual ECHP samples it leads to improve estimates, although the correlation between years tends to reduce the “effective” sample size for overall statistics (Kish, 1990). Thus, considering the correlation over waves, we can estimate a factor (fv) evaluating the standard error decrease because of consolidation of poverty measures over two or more periods. In our work, fv is given by the ratio between variance of an average over the eight waves and the variance of the same estimate from a single wave. With regard to the ECHP Italian Section, the efficiency gain of the estimates Head Count Ratio resulting from cumulation over waves is 0.60 (European Commission, 2005: 155). This factor is considered country-specific, more or less independent on the particular variable in the set. However, pooling highlights the underlying structural relationship of real interest; in such a way, the poverty measures constructed on pooled data tend to be more stable than the ones defined on one wave only. As anticipated above, the effective sample size, after pooling over waves, is also strongly influenced by the data correlation over time; the cumulation of the same elements (persons, households) does not increase proportionately the sample size (Kish, 1990). Nevertheless, there is a significant increase in the ECHP 8 After pooling, the estimates (e.g., the mean of a generic variable, y, referring to the period including T waves) can be obtained by a weighted average of the annual estimates. “The estimator, being a linear combination of unbiased estimators, is unbiased and optimum, in terms of standard errors, when the annual estimates are independent” (Cannari & D’Alessio, 2003). This condition is not satisfied when all, or a sub-set of individuals, are re-interviewed over time on account of the correlation in the variables observed through contiguous waves. With regard to income and wealth, Cannari & D’Alessio (2003) evaluated a positive correlation, included between 0.5 and 0.7. In these cases, the variability of the estimates is higher than the hypothesis of absence of correlation over time. 46 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo effective sample size due to real variation, over time, in the composition of the sample, characteristics of the individuals and households, and also due to the presence of response variability and other random effects which tend to be balanced out in the averaging process (European Commission, 2005). With regard to the ECHP Italian Section, the effective number of waves resulting from the aggregation over waves is not equal to the original number of cumulated waves but is 2.76. Subsequently, in order to assess the time effect and, then, the opportunity of pooling the longitudinal data over all ECHP waves, following Ferrante et al. (2004) and Fabrizi et al. (2005), we test several possibilities. Particularly, we note that the effect of considering more contiguous waves is surely very significant when no auxiliary information is considered; it allows for a noteworthy gain in efficiency but, as expected, the introduction of some covariates further increases estimators efficiency. Similarly, re-testing the same models with the same covariates but considering direct poverty measures defined on only one ECHP wave (year 2001), we obtain decreased estimators efficiency compared to the previous results. Although the inclusion of covariates makes the effect of aggregating adjacent waves less relevant than in the case without covariates, it does not disappear or, better still, data pooling allows a significant larger gain in terms of efficiency9. Actually, in order to assess the gain in efficiency that could be achieved by borrowing strength across both small areas and time when sample surveys are repeated in time, Rao and Yu (1994) and, successively, other authors (Ghosh et al., 1996; Datta et al., 2002) proposed an extension of the basic Fay-Herriot model (1979). Those models on the qit ’s depend on both area-specific effects and the area-by-time specific effects which are correlated across time for each i (Rao, 2003). We intentionally neglected these aspects but, in order to make also some comparisons with the results summarised in this work, we are going to examine them closely afterwards. The majority of the 103 Italian provinces provide ECHP data over all waves. As a matter of fact, there are only 10 provinces where survey data for direct estimates are not available; so, they are not considered in the Fay-Herriot models but it is possible to predict their poverty estimates on the basis of the regression-synthetic estimators. Another 11 Italian provinces show a number of ECHP waves lower than 8; for these provinces the sample size is very small and, generally, less than 30 units10. 9 Leaving out the effect of data pooling, the means of the ratios MSE(EBLUP Estimate)/MSE (Direct Estimate) are the following: 0.8105, 0.8187, 0.8191, 0.7659, respectively, for the HCR_I, HCR_NUTS2, LogEquInc and FM. As illustrated later, they denote increases estimators efficiency significantly smaller compared to the hypothesis considering the data pooling. 10 Italian provinces without ECHP data are Vercelli (Piedmont), Sondrio, Mantua (Lombardy), Ascoli Piceno (Marches), Rieti (Latium), Isernia (Molise), Vibo Valentia (Calabria), Caltanissetta, Enna and Ragusa (Sicily). L’Aquila (Abruzzo) and Imperia (Liguria) are two Estimating Poverty in the Italian Provinces… 47 3.2 Istat database as source for area-level provincial indicators The Istat database of Territorial Indicators, freely available on-line, provides a valuable data source for the construction of sub-national indicators. It covers fifteen subject-matter areas with a lot of spatial indicators some of them disaggregated at regional and provincial level. They are measures of levels to use as indicators of demographical, social and economic disparities across regions or provinces. In such a way, it allows to investigate in depth the Italian territory; so, it is a very important source for economic research and policy analysis. In addition, the availability of time-series data allows us to analyse some economic phenomena over time at a sub-national level. We have chosen the Istat database for the selection of territorial indicators for several reasons. It is easily accessible and convenient to use and, most importantly, it shows a rich body of social and economic covariates which may “explain” the poverty related to the characteristics of the provincial area. Some of those, used in conjunction with direct indicators, notably contribute to improving the quality of the estimation process, in terms of efficiency, of some poverty measures. Such an analysis allows us to determine some of the socio-economic factors contributing to poverty levels and living standards, and to investigate in depth the territorial perspective in the poverty analysis at a sub-national level. The research and the selection of small area indicators are two very important phases of our procedure. Among all the Istat provincial indicators, that can be used as regressors to produce more precise poverty measures using small area estimation techniques, we have chosen the following ones: Activity Rate, Employment Rate, Unemployment Rate, Population Density, Resident Population per 100 inhabitants, Index of Territorial Concentration of the Resident Population, Net Migratory Rate, Hospitalization Rate, Public Hospitalization Rate, Crude Birth Rate, Crude Death Rate, Infant Mortality Rate, Marriage Rate, Crime Rate, Suicides per 100.000 inhabitants, Legal Separation Rate, Divorce Rate, Gross Domestic Product (GDP), Growth Enterprises Rate (net of agriculture). 4 Monetary poverty measures as target variables In our study, we take into account four monetary poverty measures as dependent variables in the small area estimation models. As illustrated above, they refer to Italian context at NUTS3 level. extreme cases of Italian provinces with a very small ECHP sample size. The former shows one unit on one wave while the latter shows three units on one wave. The provinces of Arezzo and Sienna (Tuscany) show a sample size more than twenty individuals but less than thirty. Other provinces with a very small ECHP sample size are Asti, Biella (Piedmont), Lodi (Lombardy), Rimini, Ferrara (Emilia-Romagna), Massa-Carrara and Prato (Tuscany). 48 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo Table 1: Monetary Poverty Measures; Target Variables for Small Area Estimation Models. 1 HCR I Head Count Ratio – Country Poverty Line 2 HCR NUTS2 Head Count Ratio – NUTS2 Poverty Line 3 LogRedEqu Mean of Logarithm of Equivalised Income 4 FM Fuzzy Monetary In the traditional approach for the measurement of poverty, the most commonly used income poverty measure is the Head Count Ratio. It is the proportion of poor individuals in the total population of individuals. A poor individual is defined as a individual whose equivalent income is below a poverty line11. In our work, the Head Count Ratios have been computed with respect to two poverty line levels. Poverty thresholds defined with respect to income distribution at the country level and with respect to income distribution separately within each NUTS2 region. In both cases, the poverty line is equal to 60 per cent of the median of the equivalised income distribution12. The Head Count Ratio captures the poverty incidence but it displays nothing about the intensity of the deprivation of being poor, the relative deprivation stemming from the income inequalities of the poor and non-poor, and on the disparity in mean of the two sub-populations (Dagum, 2002). The traditional approach to poverty analysis, based on the rigid dichotomisation between the “poor” and the “non-poor”, tends to oversimplify reality. It tends to wipe out all the nuances existing between the two opposite extremes of distinct material hardship and substantial welfare13. In other words, poverty should be considered as a matter of degree rather than an attribute that is simply present or absent for individuals in the population (Cheli & Lemmi, 1995). In principle, all individuals are subject to poverty but to varying degrees. So, in 11 For each individual, the equivalised income is defined as total household income divided by equivalised household size, obtained according to the modified-OECD scale. It gives a weight of 1 to the first adult, 0.5 to other household members aged 14 or over and 0.3 to household members aged under 14. Each person in the same household receives the same equivalised income. 12 To identify the poverty threshold, we referred to the criterion used by Eurostat. In order to limit the liberty of researcher, the 2nd Report on Income, Poverty and Social Exclusion in Europe (2002) recommends to define the poverty line equal to 60% of the median of the equivalised income distribution. This Report also illustrates some poverty measures obtained according to 40, 50 and 60 per cent of the mean, and 50 and 70 per cent of the median. 13 “Given a two-adult household with income y=Z-e, where e is an infinitesimal, and another with null income, each of them is counted as a poor household. On the other hand, a two-adult household with income y=Z+e and another with income y=1000Z are counted as two non-poor households. That is a very strong limitation of the univariate approach to the analysis and measurement of poverty and its policy implications” (Dagum, 2002). In Italy, the National Statistics Institute partly reduces that problem. Beyond the official poverty line, it defines other two poverty thresholds, respectively, equal to 80 and 120 per cent of Estimating Poverty in the Italian Provinces… 49 addition to the conventional poverty measures, our paper discusses the propensity to income poverty (Fuzzy Monetary) that is determined by the individual’s rank in the income distribution and the individual’s share in the total income received by the population14. In our work, the Fuzzy Monetary, based on the distribution function F(.) of the equivalised income, is defined as (Cheli, 1995): FMi =[1 - F(yi)]a, a³1 (4.1) where yi is the household equivalent income level of the ith observed unit while the exponent a is “the weight of the poorest ones compared to the less poor” (Lemmi et al., 1997). In order to guarantee the comparison between conventional and fuzzy poverty analysis, a parameter is defined so as to make the expected value of the membership function the same as the proportion of the poor individuals defined by the Head Count Ratio (Cheli & Betti, 1999). 5 Evaluating the sampling errors To evaluate a small area estimation model using a procedure such as EBLUP, it is essential to analyse the production process of the mean-squared errors of the composite poverty estimates. Composite estimates are a weighted mixture of the direct and synthetic estimates. The former are derived from ECHP survey data for the small areas concerned, taking into account the sampling design; the latter are those derived by fitting an appropriate small area model using some of the Istat territorial indicators, disaggregated at NUTS3 level. The weights (yi) of the linear combination depend on the design variance related to the direct estimates and the model variance of the synthetic estimates. the consumptions per head. In such a way, it produces other two categories of poor: the almost poor and the just poor. 14 The conventional approach is a special case of the fuzzy one with the population dichotomised as 1 if it’s poor and 0 if it’s not. Individuals with an income below a certain threshold are deemed to be “poor”, with a constant propensity equal to 1; others with an income at/or above that threshold are deemed to be “non-poor”, with a constant propensity equal to 0. Those concepts can be extended to cover non-monetary aspects of living standards in the form of “Fuzzy Supplementary” measures. In such a way, in addition to the level of monetary income, the living standards of households and individuals can be described by a host of indicators, such as possession of durable goods, housing conditions, expectations, general financial situation, perception of hardship, etc. Naturally, the quantification of a large set of non-monetary indicators of living conditions involves a large number of steps, models and assumptions. Moreover, income poverty and non-monetary deprivation can be also studied in combination to construct other composite measures indicating the extent to which the two aspects of income poverty and lifestyle deprivation overlap for the individual concerned. Those aspects are not considered in the present study; we intend, anyhow, to examine them closely afterwards. 50 Claudio Quintano, Rosalia Caste llano, and Gennaro Punzo Sampling variance - or its square-root named standard error - is a measure of the variability in the direct estimates being based only on a section of the population. It depends on the sample design and size. For that reason, it increases as we move to smaller domains with smaller sample sizes. Model variance is a measure of the variability between the direct survey estimates and the model estimates based on the predictor variables. It is important to test how well the model fits the data. The sampling error estimate - essential to define the weight of the direct estimator in the mixture of the two estimators - is very complicated on account of the high level of complexity of some poverty and deprivation measures (particularly, fuzzy measures) compared to the ordinary proportions, means and ratios; moreover, the sample designs on which they are based are usually very complex. Another difficulty comes from the fact that the estimates of sampling errors are themselves subject to variability, increasing as we move to smaller domains with smaller sample sizes. When it is possible to adopt some simplified assumptions, we may factorise the standard error estimate at a sub-regional level into several components each of them represents some aspects of the complexity of the sampling design and the estimation procedure (weighting, stratification and clustering, pooling over waves, etc.). There is a considerable empirical evidence suggesting that many of these factors act more or less independently of each other (Verma et al., 1980; 1993). Particularly, for the Head Count Ratio and hypothesizing that the factor effects can be taken as multiplicative (Verma & Thanh, 1996), we can break down the standard error (sterrv) into the following factors (Verma & Betti, 2005): sterrv = sev × kv × dv × fv (5 1) The first factor (sev) represents the standard error which would be obtained in a simple random sample of the same size, without the complexities other factors represent. Neglecting minor factors such as the finite population correction, the factor sev increases when the sample size decreases: sev = —p (5.2) vVn J where sdv is the standard deviation in the population. Independent of the sample design or size, for a simple proportion (e.g., the HCR15) sdv is defined as: 15 Actually, the statistic Head Count Ratio, as it is defined in terms of a poverty line which is itself subject to sampling variability, is more complex than a simple proportion. However, empirical results (Berger & Skinner, 2003; Verma, 2004) indicate that the deviation standard, defined as above, still provides a reasonable approximation for it. In other words, for HCR=p and Estimating Poverty in the Italian Provinces… 51 sdv=^p(1-p) (5.3) A summary measure of the effect on sampling error of various complexities in the design is the design factor or, its square, design effect. It can be directly decomposed into two components (Gagliardi et al., 2006): the effect of sample weights on variance (Kish factor, kv) and the effect of clustering, stratification and other aspects of the design (dv.). Firstly, weighting may be often determined on the basis of external factors (e.g., need to over-sample small regions; compensation for high non-response in particular sample areas, etc.). Such weighting, essentially uncorrelated with survey variables, tends to inflate sampling error of the several estimates (Verma & Thanh, 1996), independently of the structure of the sample. Kish (1965, 1989) gives a very simple expression for estimating the effect of arbitrary weights: ki = n wj w 2 1CVi 2 ( wj ) (54) where CVi is the coefficient of variation of individual weights wj in the ith domain. Secondly, the design factor depends on the structure of the sample as well as the variable being estimated. With regard to ECHP data, the European Commission Report (2005) provides some additional information on effects of sample weights and design effects, averaged over household income related variables16. Finally, as illustrated in the section 3.1, the fv factor is a measure of the standard error decrease, because of consolidation of poverty measures over the eight ECHP waves. In such a way, the poverty measures have less extreme variations compared to the results based only on one wave17. domain sample size ni , we can approximate its simple random sample standard error as: \p(1-p) 16 As a matter of fact, the Report summarizes Kish and design factors for income-related variables for the several European Union Member States involved in the ECHP project. With regard to ECHP Italian Section, Kish and design factors are, respectively, equal to 1.13 and 1.86 with a joint effect equal to 2.10. 17 Sometimes, it is possible to factorise the standard error estimate at a sub-regional level into other three components, gv, rv. and sv. First, gv represents the gain in efficiency achieved by using different poverty thresholds (such as 50, 60 and 70 per cent of the median income) and then taking an appropriately weighted average of those. In this work, it is equal to 1 because this synthesis procedure is not operated; poverty measures have been computed with reference to a single poverty threshold. Second, rv measures the impact on the efficiency of the sub-regional poverty estimates in a hierarchical model (Ratio Approach). Third, the sub-population factor (sv) compares the difference between the increase in the sampling error on account of the reduced sample size when we consider only a sub-population of interest – e.g., the standard errors of the children or elderly person – with the sampling error of the whole population. 52 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo With regard to other income poverty measures (LogEquInc and FM) considered in our work, the main differences concern the computation of the standard deviation. As a matter of fact, given the generally small range of the design effects (kv.dv), a common value can be used for the set of income poverty variables of interest and across different regions in the Country (European Commission, 2005). Obviously, for Fuzzy Monetary the standard deviation is computed with reference to proportion pv which may differ significantly from the proportion p for Head Count Ratio. 6 The stepwise procedure for the selection of covariates With regard to the choice of the covariates in order to build the small area models, after verifying the availability of territorial indicators for every Italian province, we analyse the correlation matrix amongst all possible covariates. We are also interested in evaluating their degree of association with each one of the dependent variables (poverty measures). In fact, in order to avoid multicollinearity problems, we don’t include all the available covariates in each model. In the case of strongly correlated covariates, conditions being equal, we select the ones with a higher level of correlation with the poverty measure to estimate according to a stepwise procedure. Sometimes, we also included in the model some statistically non-significant covariates because they also contribute to improve the efficiency of some poverty estimates. Auxiliary variables matrix is composed of 19 territorial indicators referring to 2001. There are two main reasons why we choose 2001 as a reference year. Firstly, because it is the year of the census surveys and the majority of the territorial indicators are updated on the census results; in such a way, we have more updated information. Secondly, because direct estimates come from the ECHP data survey which was completed exactly in the 2001. Our analysis underlines the high level of positive correlation, significant at 95%, between Activity Rate and Employment Rate and the high level of negative correlation between Employment Rate and Unemployment Rate. The correlation degree between Activity Rate and Unemployment Rate is lower than the previous ones; that justifies the choice in considering the two indicators in the small area models jointly or, alternatively, the Employment Rate by itself when it is able to explain the higher variability of the target variable. We underline the high level of concordance, significant at 95%, amongst all the variables reflecting the territorial distribution of the population; especially, we highlight the concordance, significant at 95%, between Resident Population per 100 inhabitants and Index of Territorial Concentration of the Resident Population. On the basis of the above criterion, we only select one of the three demographic Estimating Poverty in the Italian Provinces… 53 variables with the higher level of explicative power of the variability of the dependent variable. Moreover, there is a high level of positive correlation, on the one hand, between Hospitalization Rate and Public Hospitalization Rate and, on the other hand , between Legal Separation Rate and Divorce Rate. A high degree of concordance, significant at 95%, exists between Net Migratory Rate and Gross Domestic Product. In those cases, the variables are mutually exclusive for the small area models. In the end, it is important to note the correlation amongst other covariates, rather small and sometimes with a low significance level. The Istat official classification divides the Italian territory into five macro regions: the North-East, the North-West, the Centre, the South and the Islands. In order to explore the territorial contribution to poverty estimates, we introduce, in the small area models, a qualitative variable reflecting the geographical localisation of the different Italian provinces into one of the macro regions. We need to construct four dummy variables – respectively, for the North-East, the North-West, the Centre and the South – to exhaust the information contained in the original qualitative scale. Using binary (0,1) coding, all province “members” of a particular geographical area are assigned a code of 1; provinces not in that particular geographical area receive a code of 0. Following that coding convention, we construct a set of dummy variables for a given categorization so that any particular province is coded 1 on one and only one dummy variable in the set. Provinces belonging to geographical area “Islands” are easily identified; they present a code 0 for all the dummy variables. That category, not named as a dummy variable, is our reference group18. In such a way, we determine some of the socio-economic factors contributing to poverty levels and we are able to investigate in depth the territorial perspective in the poverty analysis at a provincial level. 7 Composite estimates of income poverty measures Each SAE model is composed of two steps. In the input step, we construct both the direct estimates, in absolute terms, for each poverty measure and for every Italian province, and the corresponding standard errors. They are defined on the ECHP data and a larger sample size is achieved by pooling over available waves (1994-2001). In the output step, we obtain the EBLUP composite estimates, in absolute 18 We choose the geographical area “Islands” as the reference group because we simply noted that the direct and composite poverty estimates of the island provinces are significantly higher than the other ones. Particularly, that is true for the Head Count Ratio defined with respect to the income distribution at the country level (HCR_I) and, as mirror-reflection, for the Logarithm of the Equivalent Income (LogRedEqu). Obviously, we also verified the coherence of the results in comparison with those we should have obtained in the case of a different geographical area as reference group. 54 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo terms, for each poverty measure and for every Italian province, and the corresponding mean-squared errors (MSE) of the same target variable. Such a model has been separately estimated for each poverty measure. In order to evaluate the performance of the estimation process through small area models and, consequently, the contribution of auxiliary information on composite poverty estimates, we define some outcome measures. Firstly, the model parameter gamma (shrinkage factor, ?) that is the ratio between the model variance and the total variance; it is the weight given to the direct survey estimate in the final composite estimate. Secondly, the ratio between the EBLUP estimated values and the corresponding direct estimates and also the ratio between mean-squared error of the EBLUP estimates and mean-squared error of direct survey estimates are produced. The former allows to test the extent to which the modelling has modified the input direct estimates; the latter measures the improvement in the accuracy level of the estimates provided by modelling19. Both the ratio EBLUP Estimate/Direct Estimate and the ratio MSE (EBLUP Estimate)/MSE (Direct Estimate) have unity as a benchmark. In particular, for the second ratio, which is also an average measure of goodness of fit of the small area models, we expect values lower than 1. For each of the above outcome measures, we define several summary statistics as the mean value over all NUTS3 areas in the model, the coefficient of variation (CV) of those values and the minimum and maximum values. To check the extent to which the modelling has improved the efficiency and precision of each poverty estimate, we use, as a synthetic measure, the complement to one of the mean of the ratios MSE (EBLUP Estimate)/MSE (Direct Estimate) defined on each province. The improvement degree of the accuracy of the poverty estimates is likely correlated with the association level between auxiliary and target variables. In particular, for the target variable Head Count Ratio, defined with respect to the income distribution at the country level, (HCR_I), we note that the ratio EBLUP Estimate/Direct Estimate is close to unity (1.1156); it shows that, on average, the modelling variations on direct estimates are compensated amongst the provinces20. Furthermore, what is interesting is the large gain, in terms of efficiency, deriving from modelling, that is, on average, equal to 0.3438. 19 Analysis and data processing has been carried out with R, a software environment for statistical computing and graphics. R is an open source implementation of the S language elaborated by John Chambers and other researchers of the AT&T Bell Laboratories. The R software is freely available for researchers, programmers and users on website http://cran.r-project.org. 20 This consideration also comes from the comparison between the mean value of the direct estimates (0.176) and the mean value, slightly higher, of the EBLUP composite estimates (0.180). Estimating Poverty in the Italian Provinces… 55 Table 2: Summary statistics on performance outcome measures. Shrinkage factor (?i) Mean CV Minimum Maximu m 1 2 3 4 HCR_I HCR_NUTS2 LogEquInc Fuzzy Monetary 0.4109 0.3966 0.4149 0.2243 0.5195 0.5281 0.5171 0.6449 0.0005 0.0004 0.0005 0.0002 0.8174 0.8053 0.8197 0.6121 EBLUP Estimate / Direct Estimate 1 2 3 4 HCR_I HCR_NUTS2 LogEquInc Fuzzy Monetary 1.1156 1.0513 1.0002 1.0478 0.4467 0.3003 0.0110 0.2909 0.4099 0.2780 0.9771 0.4309 3.6424 2.3296 1.0592 2.3770 Mean-Squared Error (EBLUP Estimate) / Mean-Squared Error (Direct Estimate) 1 2 3 4 HCR_I HCR_NUTS2 LogEquInc Fuzzy Monetary 0.6562 0.6542 0.6679 0.5336 0.3546 0.3502 0.3476 0.4087 0.0247 0.0234 0.0246 0.0165 0.9420 0.9389 0.9686 0.9132 Source: Our elaborations on ECHP data, Italian Section (1994-2001), and Istat (2001) With regard to the target variable Head Count Ratio, defined with respect to income distribution separately within each NUTS2 region, (HCR_NUTS2), we obtain similar results in comparison with HCR_I. In fact, the ratio EBLUP Estimate/Direct Estimate is close to unity (1.0513) again and the improvement degree, in term of efficiency, deriving from modelling is, on average, equal to 0.3458. Inevitably, parameter gamma values (?) reflect the magnitude of gain of the efficiency. For these two poverty measures, the gamma mean value is, respectively, equal to 0.4109 and 0.3966. The lowest gamma values concern those provinces with a small sample size21. HCR_I and HCR_NUTS2 direct estimates show a null value for some Italian provinces. A deeper analysis highlights that those null values concern the provinces with a very small sample size and, consequently, with high standard errors of direct estimates. As illustrated above, ECHP sample size is very small for 21 For example, with regard to HCR_NUTS2, we compare the mean value of the gamma parameter concerning the provinces with a small sample size (0.0092) with the mean value of the same parameter concerning the other Italian provinces (0.4648). This comparison highlights the importance of the sample size to define the gamma parameter and, consequently, the incidence of the direct estimate in the final EBLUP composite estimate. 56 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo eleven Italian provinces; we denote that a larger gain in efficiency is observed in Italian provinces with a smaller sample size22. The target variable Logarithm of Equivalised Income (LogEquInc) shows the lowest value of ratio EBLUP Estimate/Direct Estimate compared to the other income poverty measures. The highest values of this ratio are, respectively, equal to 3.6424 for the HCR_I and 2.3296 for the HCR_NUTS2; both values are referred to the province of Lucca (Tuscany). For the LogEquInc, the highest value is 1.0592 (province of Imperia, Liguria); that signifies a smaller variability, in comparison with previous target variables, between EBLUP composite and direct estimates23. The gain in efficiency, on average equal to 0.3321, is slightly lower than the previous poverty measures24. In particular, the largest gain in efficiency, on average equal to 0.4664, is registered for the fuzzy income poverty measure, Fuzzy Monetary25. In our study, we use the coefficient of variation (CV) of ratios between EBLUP composite and direct estimates as indicator of variability of the divergence amongst estimates before and after the modelling. For each target variable, it allows us to obtain a relative measure of variability as regards its mean value. The lowest coefficient of variation of these ratios is registered for Logarithm of Equivalised Income (0.0110) against the higher value of the Head Count Ratio, defined with respect to income distribution at the country level (0.4467)26. 22 For example, the province of Biella (Piedmont), where only 17 units were interviewed over 7 waves, shows a null value of HCR_I direct estimate with an elevated standard error (0.445); its corresponding EBLUP composite estimate is equal to 0.078 with a lower standard error (0.048) and a very high gain in efficiency (close to 89%); the gamma parameter, that is the weight given to the direct estimate in the final EBLUP composite estimate, is obviously very low (0.010). Similarly, the province of Lodi (Lombardy), where only 14 units were interviewed during 4 waves, shows a HCR_I direct estimate equal to 0.25 with an elevated standard error (0.453); its corresponding EBLUP composite estimate is equal to 0.105 with a lower standard error (0.049) and a very high gain in efficiency (close to 89%); once more, gamma parameter is very low (0.010), expressing the low incidence of direct estimate in the final EBLUP composite estimate. Consequently, the lowest value of gamma parameter (0.0005) is registered for the province of L’Aquila (Abruzzo) that is the province with the smallest number of units interviewed (1 unit over 1 wave). On the contrary, the most elevated value of gamma parameter (0.8174) is registered in the province of Milan (Lombardy) that is the province with the highest number of units interviewed (6313 units over 8 waves). 23 Obviously, we exclude all the provinces where the ratios EBLUP Estimate/Direct Estimate are indefinite because of null value (or close to zero) of the direct estimates. 24 For the target variable LogEquInc, the lowest gain in efficiency is registered for the province of Milan (Lombardy) while, for the other monetary poverty measures, it was usually registered for the province of Rome (Latium). It is important to note, however, that the sample sizes of the provinces of Milan (6313) and Rome (6276) differ slightly. 25 Corresponding to provinces with a very small sample size, the direct estimates of the conventional income poverty measures usually register null values. On the contrary, the direct estimates of the fuzzy income poverty measure are usually different from zero because the Fuzzy Monetary may always capture all the poverty nuances. 26 With regard to LogEquInc, the smaller divergence between direct and EBLUP composite estimates can be also verified by comparing the maximum (1.0592) and the minimum (0.9771) values of the ratios EBLUP Estimate/Direct Estimate, both close to unity; their difference is very Estimating Poverty in the Italian Provinces… 57 Moreover, we use the coefficient of variation (CV) of ratios between mean-squared errors of EBLUP composite and direct estimates as an indicator of variability of gains, in terms of efficiency, deriving from small area models. Our analysis highlights that the magnitude of the gain in efficiency for poverty estimates at NUTS3 level derives from both sample sizes and goodness, in terms of explicative power, of the territorial indicators. The highest CV value is registered for Fuzzy Monetary (0.4087) while the lowest CV value is gathered for Logarithm of Equivalised Income (0.3476). However, the difference between the minimum and maximum CV value is not so large. Finally, the gamma parameter estimates show a high variability with regard to all the monetary poverty measures. 7.1 Some simulation results to evaluate the performances of small area estimators By following Rao and Choudhry (1995) and Rao (2003), a simulation study is undertaken in order to assess the relative performance of direct, synthetic and composite EBLUP estimators associated to models adopted in this work. Later, we compute a set of indicators describing the performances of the estimators on average with respect to the non-empty Italian provinces (m=93) representing the small areas of interest. In the simulated experiment, the sample of 52.687 respondent households (pooling over all suitable ECHP waves) is treated as the overall population; in order to make comparisons amongst estimators under study, we generate 500 simple random samples (R=500), each of size n=1000, from the overall population. From each simulated sample, we calculate the direct, synthetic and EBLUP composite estimators with respect to each target variable of our analysis; finally, for each estimator and with regard to each poverty measure, we compute the following indicators: 1) Average Absolute Relative Bias (AARB), evaluating the bias of an estimator 2) Average Absolute Relative Error (AARE), measuring the accuracy of an estimator 3) Average Relative Efficiency (AEFF), given by ratio between Average Relative Mean Square Errors (ARMSE) of the direct and indirect estimators, assessing the relative efficiency of the indirect estimators against the direct one. In the following formulas, estr indicates the value of the indirect estimator for the rth simulated sample whereas Yi is the “true” NUTS3 poverty measure: small and equal to 0.0821. On the contrary, referring to HCR_I, the range is by far wider and equal to 3.2325. 58 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo AARB = m1 ?m i=1 R1 ?R r=1 estr -1 (7.1) AARE = 1 ?m 1 ?R m i=1 R r=1 estr -1 Yi (7.2) ARMSE(est) = 1 ?m 1 ?R (estr -Yi )2 m i=1 R r=1 (7.3) AEFF = ARMSE(direct) ARMSE(est) (7.4) Table 3: Comparison between direct, synthetic and EBLUP composite estimators (simulated data). 1 2 3 4 AARB% AARE% ARMSE% AEFF% HCR I Direct Synthetic EBLUP 0.7027 2.0669 1.4952 3.9041 2.2966 1.8228 12.15 4.49 3.75 100.00 270.59 324.39 HCR NUTS2 Direct Synthetic EBLUP 0.6675 1.9341 1.3930 3.6984 2.1756 1.7268 11.51 4.10 3.56 100.00 280.65 323.12 LogEquInc Direct Synthetic EBLUP 0.5385 1.5658 1.1268 1.6466 1.3462 1.3380 22.72 8.48 7.19 100.00 267.83 315.99 Fuzzy Monetary Direct Synthetic EBLUP 0.6602 1.9197 1.3815 3.9367 1.6505 2.0172 11.99 2.63 2.86 100.00 455.30 419.06 Source: Our elaborations on ECHP data, Italian Section (1994-2001), and Istat (2001) Results confirm, for all the poverty measures considered in our work, the significant gain in efficiency derived from area specific small area models. As a matter of fact, EBLUP composite estimators show the largest AEFF and the smallest AARE values; in particular, AEFF validates the noteworthy efficiency gains for the Fuzzy Monetary. As proved by empirical results, the direct estimators overestimate the variability amongst the Italian provinces due to the effect of the sampling error which increases with the decreasing size of samples in the areas whereas the same results show that the synthetic estimators perform significantly better than the direct ones; probably, due to the fact that, unlike direct estimators, synthetic estimators take into account the relationship between poverty measures and some Estimating Poverty in the Italian Provinces… 59 exogenous information. However, this reduction in variance is partly counter-balanced by the bias inherent in synthetic estimates captured by mean-squared error. So, EBLUP composite estimators, being an optimally weighted mixture of direct and synthetic estimators, are more likely to reflect the true variability than either of the two; as a result, it balances the potential bias (AARB) of synthetic estimators against the instability of the direct ones. That advantage becomes more and more marked as we move to smaller domains with smaller sample sizes (i.e., in Italy, from regions to provinces). 8 The goodness of the indicators and the territorial perspective in the poverty analysis at a sub-national level In order to evaluate the goodness, in terms of explicative power, of the territorial indicators that we selected to define synthetic estimates, we analysed the territorial indicators as regards each target variable. So, we investigated in depth the stepwise procedure for the indicators selection and its effects in terms of variability of the composite estimates. Our analysis, conducted at a provincial level, highlights that the gamma parameter values derive from the reliability of direct estimates; in fact, gamma increases with the increasing size of the samples in the provinces. There is a positive relationship between the sample size and the weight given to the direct estimate in the final composite estimate. Hence, auxiliary information advantages become more marked as we move to provinces with smaller sample sizes; the gains in efficiency decrease with the increasing size of sub-samples in the provinces. In other words, the gains from modelling are more significant when the breakdown level increases. In fact, small area estimation models improve only marginally the efficiency and the precision levels of the direct estimates at country level, especially when survey data can be cumulated over time, as in our analysis. The gains from modelling are obviously more significant at a regional level and, even much more, at a provincial level. Such poverty analysis is also interesting in determining some of the socio-economic factors contributing to poverty levels and living standards. Their exploration is essential for both policy formulation and implementation to eliminate the main causes of poverty, such as unemployment problems, family conditions, social and environmental difficulties and many other aspects not reflected in our analysis. As illustrated above, in our work the Head Count Ratios have been computed with respect to two poverty line levels while the Fuzzy Monetary has only been computed with respect to the income distribution separately within each NUTS2 60 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo region. The strategy to define poverty measures with respect to the regional income distribution allows to consider, within limits, some territorial specificities. Regression coefficients of the dummy variables evaluate, on average, the distance between synthetic poverty estimates of the insular provinces and synthetic poverty estimates of the other Italian provinces, net of effects of the other regressors. In other words, for each Italian province, the regression coefficients assess the effect of being in a particular geographical area in comparison with the reference category. For example, with regard to the HCR_I, the coefficient for North-West shows that, on average, those provinces have a poverty incidence of 0.099 lower than insular provinces. As they say, the location of an Italian province in the North-West of the Country rather than the Islands has a positive effect on poverty incidence. In relation to Head Count Ratio, defined with respect to the income distribution at the country level (HCR_I), the most significant covariates tend to be Activity Rate, Unemployment Rate, Resident Population per 100 inhabitants and Growth Enterprises Rate. The latter is significant at 95% level while the others are significant at 99%. Similarly, in relation to Head Count Ratio, defined with respect to income distribution separately within each NUTS2 region (HCR_NUTS2), Resident Population per 100 inhabitants preserves the high significance level of 99%; the significance level of Growth Enterprises Rate improves and, finally, the significance levels of Activity Rate and Unemployment Rate are, respectively, equal to 95% and 90%. It is interesting to note that both of the conventional poverty measures above illustrated show the same sign of the coefficients of the statistically significant covariates. In particular, the target variables both show a negative relationship with Activity Rate and Resident Population per 100 inhabitants and a positive relationship with Unemployment Rate and Growth Enterprises Rate27. With regard to Logarithm of Equivalised Income (LogEquInc), the majority of the covariates – Activity Rate, Unemployment Rate, Resident Population per 100 inhabitants, Crude Death Rate and Growth Enterprises Rate – are significant at 95%; Gross Domestic Product is significant at 90% level. Finally, in relation to Fuzzy Monetary, computed with respect to income distribution separately within 27 Catania and Agrigento (Sicily) are the Italian provinces with the highest HCR_I composite estimates, respectively, equal to 0.486 and 0.482. They are immediately followed, with similar values, by other insular or southern provinces as Nuoro (Sardinia), Foggia (Apulia) and Cosenza (Calabria). An other southern province with a high poverty incidence (0.414) is Crotone (Calabria), which shows the lowest activity rate (0.372) and an unemployment rate (0.171) by far greater than its mean value. On the other side, the province of Bolzano (Trentino-South Tyrol), with the highest activity rate (0.587) and a very low unemployment rate (0.018), shows a low poverty incidence (0.083). Other Italian provinces with very low HCR_I values are Ferrara, Rimini (Emilia-Romagna) and Genoa (Liguria), respectively, equal to 0.040, 0.043 and 0.047. Estimating Poverty in the Italian Provinces… 61 each NUTS2 region, Resident Population per 100 inhabitants and Growth Enterprises Rate preserve the high significance level that is, respectively, equal to 99% and 95%; Unemployment Rate is significant at 95% and Marriage Rate at 90%28. Table 4: Parameter Estimation and Significance Level. Independent variables HCRI HCRN2 LogEquInc Fuzzy Monetary Intercept Activity Rate Unemployment Rate Resident Population per 100 inhabitants Crude Birth Rate Crude Death Rate Marriage Rate Suicides per 100.000 inhabitants Legal Separation Rate Divorce Rate Growth Enterprises Rate Gross Domestic Product (GDP) North-West North-East Middle South 0.4618*** -0.7571*** 0.8028*** -0.1955*** 0.9862 2.3773** -0.0990** -0.0979** -0.0947*** -0.0679*** 0.2909** -0.4939** 0.3375* -0.1976*** -0.2690 0.7376 3.0945*** 0.0692* 0.0498 0.0398 -0.0226 7.8776*** 1.5275** -0.8490** 0.2650** 3.0416** -0.8444 -4.3523** 0.0014* 0.1263** 0.1198* 0.0916* 0.0849** 0.0223 0.3469** -0.1518*** 2.1750* 1.1677 1.7817** 0.0121 -0.0029 0.0060 -0.0367 Significance levels: *** 99%; ** 95%; * 90% Source: Our elaborations on ECHP data, Italian Section (1994-2001), and Istat (2001) In short, there are some territorial indicators consistently significant for all the target variables. Particularly, Resident Population per 100 inhabitants and Growth Enterprises Rate are consistently significant at least at 95% level; Unemployment Rate is consistently significant al least at 90% level. If we circumscribe our analysis to the traditional income poverty measures, we note that the consistently significant covariates tend to increase; in particular, even the Activity Rate is added. Some other covariates – Crude Birth Rate, Suicides per 100.000 inhabitants, Legal Separation Rate and Divorce Rate – are never statistically significant. We 28 It is interesting to note that also the Fuzzy Monetary shows the same sign of the coefficients of the statistically significant covariates compared to the ones of the conventional income poverty measures. With regard to LogEquInc, the signs of the same statistically significant covariates are, of course, the exact opposite of those of the HCR_I and HCR_NUTS2. The LogEquInc is the target variable with the higher number of statistically significant covariates. 62 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo decided to include them in the small area models because they also partly contribute to improving the efficiency of some poverty estimates29. In the end, a consideration refers to the economic interpretation of territorial indicators used to define the composite poverty estimates. For example, with regard to the Growth Enterprises Rate – that is the ratio between the difference among the registered and stopped enterprises in the year and the stock of enterprises existing at the end of the previous year – it is essential that its interpretation is effected by considering all the other indicators and, particularly, in relation to the activities, employment and unemployment levels30. 8.1 The location effect on poverty estimates With regard to the Head Count Ratio, defined with respect to income distribution at the country level (HCR_I), all dummy variables included in the small area model are statistically significant at least at 95% level. The t tests for individual coefficients show that the expected HCR_I for each province belonging to a particular geographical division is significantly different from that of the reference group. In other words, the negative coefficients for all the dummy variables indicate that the predicted synthetic estimates of the HCR_I are, for the provinces belonging to the North-West, the North-East, the Centre and the South of Italy, fewer than the predicted values for the insular provinces. Obviously, the differences amongst poverty measures belonging to different geographical divisions are captured by the entire set of dummy variables rather than by any single dummy variable. 29 With regard to the territorial distribution of Italian provinces in relation to Fuzzy Monetary, we note that the highest values are registered for the provinces of Nuoro, Cagliari (Sardinia) and Catania (Sicily) which are, respectively, equal to 0.282, 0.266 and 0.280. On the contrary, the provinces with the lowest values of the Fuzzy Monetary composite estimates are Trieste (Friuli-Venezia Giulia) and Prato (Tuscany) which are, respectively, equal to 0.062 and 0.078. As illustrated above, in relation to Fuzzy Monetary, beyond the traditional economic covariates, a social territorial indicator, that is the Marriage Rate, is added as statistically significant variable; in fact, their degree of positive correlation is adequately high (close to 0.60). On the one hand, Ferrara (Emilia Romagna), Biella (Piedmont), Ravenna and Bologna (Emilia Romagna) are the Italian provinces with the lowest marriage rate – all lower than 0.037 – with low FM values (significantly smaller than their mean value); on the other hand, Foggia (Apulia), Palermo (Sicily), Crotone (Calabria) and Naples (Campania) are the provinces displaying the highest marriage rate – all higher than 0.20 (significantly more elevated than their mean value). However, the differences amongst fuzzy poverty estimates related to different Italian provinces are captured out of the entire set of covariates considered in our analysis. 30 The Growth Enterprises Rate highlights the birth and death levels of the enterprises without considering the correlated employment effects. In fact, it is possible that the birth of new enterprises, that causes an increase in the employment level, is followed by the death of a lower number of enterprises but larger in terms of employed units. Since the Growth Enterprises Rate is not able to note that phenomenon, only an integrated analysis with the employment and/or unemployment rates provides a better interpretation of the effects of these indicators on the composite poverty estimates. Estimating Poverty in the Italian Provinces… 63 Similarly, in relation to Logarithm of Equivalised Income (LogEquInc), all dummy variables included in the small area model are statistically significant at least at 90% level. The signs of the coefficients of the dummy variables – that are the exact opposite of those of the HCR_I – express a positive relationship with the target variable. In other words, the positive coefficients for all the dummy variables indicate that the predicted synthetic estimates of the LogEquInc are, for the provinces belonging to the North-West, the North-East, the Centre and the South of Italy, higher than predicted values for the island provinces. From an overall view, we note that with regard to Head Count Ratio, defined with respect to income distribution separately within each NUTS2 region, (HCR_NUTS2), and Fuzzy Monetary, the dummy variables are not statistically significant. Probably, it is due to the fact that those poverty measures are defined with respect to income distribution separately within each NUTS2 region; in such a way, they also include the territorial perspective. Nevertheless, by ranking NUTS3 regions by the Mean of Logarithm of Equivalised Income (LogEquInc), the main point brought out is the huge disparities amongst Italian provinces and, consequently, the high level of negative correlation with the poverty measures; as expected, across provinces, the average monetary deprivation increases with the decreasing level of income. Comparing the conventional and fuzzy income poverty measures, we denote some significant differentials across Italian provinces; so, we are able to provide some additional insights in the analysis of territorial distribution of deprivation. In particular, geographical areas with the highest concentration of poverty (the Centre, the South and the Islands) show mean values of FM significantly higher than the corresponding HCR_NUTS2; that denotes an even more acute poverty situations, in terms of lower income levels, that the traditional measures aren’t able to capture. Consequently, ratios FM/HCR_NUTS2 across provinces are consistently higher than 1 with decreasing values from the Islands (1,1630), the Centre (1,1974) and the South (1,0630). Those differentials diminish as we move towards the richest provinces of the North-East (0.9941) and the North-West (0.9416) where previous ratios are slightly lower than unity31. In relation to HCR_I and LogEquInc, by comparing the estimated coefficients of dummy variables, it is possible to note the decreasing distribution, in absolute values, of these coefficients as we move from the North-West and the North-East to the South and the Islands. With regard to both previous target variables, the differences between the coefficients of dummy variables concerning the NorthWest and the North-East are small but always statistically significant. More 31 As expected, the highest values of the ratio FM/HCR_NUTS2 is registered for insular and southern provinces, as Sassari (Sardinia), Syracuse, Messina (Sicily) and Taranto (Apulia), respectively, equal to 1.348, 1.456, 1.555 and 1.300. On the other side, the lowest values of the same ratio is detected for northern provinces, as Biella (Piedmont) and Forlì-Cesena (Emilia Romagna), respectively, equal to 0.737 and 0.793. 64 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo marked is their distance from the coefficients of the Centre and, finally, this distance is perfectly clear in comparison with the South. Finally, it is important to note that if an incidence poverty analysis, conducted with respect to income distribution at country level, clearly highlights the enormous territorial differences between the North and the South of Italy, an incidence poverty analysis, conducted with respect to income distribution separately within each NUTS2 region, notably reduces these distances32. Nevertheless, with regard to Italian provinces ranked by the HCR_NUTS2, the poverty patterns are substantially unchanged33. Hence, it’s interesting to note how residence in the southern provinces often constitutes a prerogative of the poverty status34. Even though at a lower degree of territorial disaggregation, this distinctive characteristic of the Italian economy has been widely confirmed by many studies on poverty and living conditions. Particularly, results of this kind have been obtained by Coccia et al. (2003), who have performed some comparative analyses at a regional level; Istat (2006), highlighting how the risk of poverty varies significantly across most demographic, social and, mainly, geographical dimensions, emphasizes the greater risk of poverty amongst those households living in the South of Italy compared with those living in the North and, especially, how large those differentials are; and several other studies (Betti & Verma, 2004; Brasini & Tassinari, 2004; Mastrovita et al., 2003) whose aim is also to identify many other aspects of poverty, not reflected in our analysis, that an income approach overcomes. 9 Concluding remarks and further developments As illustrated above, empirical results deriving from our analysis emphasize a distinctive characteristic of Italian economy that is the huge territorial differences of socio-economic conditions of the Italian population. Poverty incidence analysis clearly highlights the gap between the “rich” northern provinces – with incidence poverty rates usually lower than their mean value – and the “poor” southern ones – with incidence poverty rates usually higher than their mean value. With regard to each Italian province, by comparing direct and composite estimates of poverty 32 The lower variability of the HCR_NUTS2 estimates in comparison with the HCR_I is expressed by a standard error, on the average, equal to 0.045, that is notably lower than the mean standard error of the HCR_I (0.134). 33 As a matter of fact, in that context, insular or southern provinces, as Nuoro, Cagliari (Sardinia), Catania (Sicily), Catanzaro (Calabria) and Foggia (Apulia), keep their severe poverty status, respectively, equal to 0.312, 0.2637, 0.272, 0.265 and 0.257. On the other side, the province of Prato (Tuscany), Trieste (Friuli-Venezia Giulia), Rimini (Emilia Romagna) and Genoa (Liguria) show the lowest HCR_NUTS2, respectively, equal to 0.019, 0.046, 0.084 and 0.091. 34 As demonstrated, northern and central Italian provinces show lower values for all the income poverty measures than their southern and insular counterparts; so, as a mirror-reflection, northern and central provinces confirm greater values for income than southern ones. Estimating Poverty in the Italian Provinces… 65 incidence we denote the North-South dualism, with a high presence of poor in the southern and insular provinces. The values of the target variable Logarithm of Equivalised Income (LogEquInc), substantially higher in the northern provinces than the southern ones, confirm that situation. The results we obtained are interesting, taken as a whole. The noteworthy efficiency gains for the poverty measures and the high level of statistical significance of the majority of the territorial indicators highlight the model adequacy. As demonstrated above, such performance is confirmed by all the outcome measures and quality indicators (Rao, 2003) as well as by Spearman’s rank correlation between direct and composite estimates35. As exhaustively illustrated, in our paper we considered several income poverty measures and we preferred to extend the analysis to a fuzzy measure treating income poverty as a matter of degree. This allows to identify not only the individuals but also the areas which, more than others, need structural interventions. Comparing and contrasting the conventional and fuzzy poverty measures illuminates differentials in the level and intensity of poverty amongst geographical areas and it allows to achieve additional information for policy formulation and implementation in order to remove or, at least, to reduce the potential causes of poverty. In addition to the income poverty measures, the living standards of households and individuals can be described by a host of non-monetary indicators according to a multidimensional approach. By appropriately weighting non-monetary indicators of deprivation, it is possible to construct quantitative indices of deprivation in its various dimensions, thus viewing non-monetary deprivation also as a matter of degree (Betti & Verma, 2004). Nowadays, the ECHP constitutes an important data source for a multidimensional poverty analysis because it provides a lot of indicators that can be classified in several homogeneous groups, each of them representing specified poverty dimensions. In the EU-SILC (European Union– Survey on Income and Living Conditions) project, which will provide two types of annual data, cross-sectional and longitudinal, a large set of non-monetary indicators has been inserted that will allow to extend the comparative poverty analysis according to a multidimensional and fuzzy approach, at national and international level36. We deliberately neglected these aspects but we intend to examine them closely afterwards. 35 For all the poverty measures considered in our analysis, the Spearman’s rank correlation tends to 1. It signifies that the ranks of the two types of estimates, direct and composite, are substantially the same. Particularly, the Spearman’s rank correlation is equal to 0.87 for HCR_I, 0.84 for HCR_NUTS2, 0.92 for LogEquInc and 0.75 for FM. 36 ECHP was a pioneering European survey until 2001; it is currently being replaced by data collection under the EU-SILC Framework Regulation (No. 1177/2003, 16 June 2003) and associated Implementing Regulations. For EU-SILC, priority is given, apart from timeliness for cross-sectional and longitudinal data availability, to flexibility, comparability and full geographical coverage. Henceforth, EU-SILC is to become the EU reference source for comparative statistics on income distribution and social exclusion at European level. 66 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo Another consideration is related to sample sizes of surveys available that are probably too small to provide data for estimation at NUTS4 or NUTS5 level. Even after aggregation over waves, ECHP doesn’t allow to achieve poverty estimates beyond NUTS3 level; similarly, Istat database of Territorial Indicators provides a valuable data source for the construction of sub-national indicators with up to NUTS3 breakdown at most. Nevertheless, the wider sample size of the EU-SILC Italian Section and its specific rotational design37, recommended by Eurostat, could allow for an immediate improvement in the efficiency of direct poverty estimates at least for the NUTS2 level, allowing to extend further the possibility of testing methods to obtain reliable estimates for a higher degree of spatial disaggregation as well as the opportunity to improve the analysis of territorial disparities38. However, further aspects of this problem could be investigated. In particular, with regards to poverty estimates, it would be quite interesting to assess their potential gain in efficiency when time specific random effects are considered; however, we propose to study them closely afterwards. Finally, the small area models, we adopted in our paper, consider the random area effects as independent. In practice, it would be more reasonable to assume that the random area effects between neighbouring areas are correlated and the correlation decays to zero as distance increases (Rao, 2003; Singh, 2005; Pratesi & Salvati, 2005; Petrucci et al., 2005; Petrucci & Salvati, 2004, 2005). In the present study, we neglected these aspects; we intend, anyhow, to examine them closely afterwards in order to explore the spatial dimensions of the data and their contribution in terms of improvement of the composite poverty estimates. Acknowledgements The authors would like to thank the referees for their useful suggestions that helped to clarify some technical issues and Prof. J.N.K. Rao for his helpful comments on an earlier version of this paper that led to many improvements in the work. 37 The theoretical sample size of the EU-SILC Italian Section is nearly to 32.000 households, by far wider in comparison with a single ECHP wave. Rotational design refers to the sample selection based on a number of sub-samples, each of them similar in size and design and representative of the whole population. From one year to the next, some sub-samples are retained, while others are dropped and replaced by new sub-samples. 38 In the hypothesis of partially overlapping samples, Kish (1999) recommended “rolling samples” as a method of cumulating data over time because they aim at a much greater spread to facilitate maximal spatial range for cumulation over time. This, in turn, will lead to improve small area estimates when the periodic sample are cumulated (Rao, 2003). Estimating Poverty in the Italian Provinces… 67 References [I] Battese, G., Hartet, R., and Fuller, W. (1988): An error-components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83, 28-36. [2] Berger, Y.G. and Skinner C.J. (2003): Variance estimation for a low-income proportion. Applied Statistics, 52, 457-468. [3] Betti, G. and Verma, V. (2004): A methodology for the study of multi-dimensional and longitudinal aspects of poverty and deprivation. Working Paper, 49. Siena: Dipartimento di Metodi Quantitativi, Università degli Studi. [4] Brasini, S. and Tassinari G. (2004): Multiple deprivation, income and poverty in Italy: An analysis based on European Community Household Panel. Statistica, 4, 673-696. [5] Cannari, L. and D’Alessio, G. (2003): La distribuzione del reddito e della ricchezza nelle regioni Italiane. Temi di Discussione, 482. Banca d’Italia. [6] Cerioli, A. and Zani, S. (1990): A fuzzy approach to the measurement of poverty. Studies in Contemporary Economics. [7] Cheli, B. (1995): Totally fuzzy and relative measures in dynamics context. Metron, 53 (3/4), 183-205. [8] Cheli, B. and Lemmi, A. (1995): A totally fuzzy and relative approach to the multidimensional analysis of poverty. Economic Notes, 24, 115-134. [9] Cheli, B. and Betti, G. (1999): Fuzzy analysis of poverty dynamics on an Italian pseudo panel. Metron, 57, 83-103. [10] Coccia, G. and Masi, A. (2003): L’analisi degli Indicatori di Povertà Regionale. Rivista di Statistica Ufficiale, 2. Istat. [II] Dagum, C. (2002): Analysis and measurement of poverty and social exclusion using fuzzy set theory. Application and policy implications. In C. Dagum and G. Ferrari (Eds): Household Behaviour, Equivalence Scales, Welfare and Poverty. Physica-Verlag. [12] Datta, G.S., Lahiri, P., and Maiti, T. (2002): Empirical Bayes Estimation of median income of four-person families by state using time series and cross-sectional data. Journal of Statistical Planning and Inference, 102, 83-97. [13] Di Consiglio, L., Falorsi, S., Paladini, P., Righi, P., Scavalli, E., and Solari, F. (2003): Stimatori per Piccole Aree per le Stime di Povertà Regionali. Rivista di Statistica Ufficiale, 2. Istat. [14] European Commission (2005): Regional Indicators to Reflect Social Exclusion and Poverty, European Commission Employment and Social Affairs DG. Brussels. (With the contributions of Betti, G., Lemmi, A., Mulas, A., Natilli M., Neri L., Salvati N., and Verma, V.). 68 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo [15] Eurostat (1996): The European Community Household Panel (ECHP): Survey Methodology and Implementation, 1. Luxembourg: Office for Official Publications of the European Communities. [16] Falorsi, P.D., Falorsi, S., and Russo, A. (2003): Stimatori per Piccole Aree per le Indagini Istat sulle Famiglie. Rivista di Statistica Ufficiale, 2. Istat. [17] Fay, R.E. and Herriot, R.A. (1979): Estimates of income for small places: an application of James-Stein procedures to census data. Journal of the American Statistical Association, 74, 269-277. [18] Fabrizi, E., Ferrante, M.R., and Pacei, S. (2005): Small area estimation of average household income based on panel data. Proceedings of the American Statistical Association, Statistical Computing Section, Alexandria, VA: American Statistical Association, 3464-3470. [19] Ferrante, F.R., Fabrizi, E., and Pacei, S. (2004): Estimation of regional income indicators using small area methods for panel data. Relazione invitata, Atti della XLII Riunione Scientifica. Università di Bari, 9-11 Giugno, CLUEP editore, 401-412. [20] Gagliardi, F., Nandi T.K., and Verma, V. (2006): Variance estimation of longitudinal measures of poverty. Working Paper, 64. Siena: Dipartimento di Metodi Quantitativi, Università degli Studi. [21] Ghosh, M., Nangia, N., and Kim, D. (1996): Estimation of median income of four-person families: A bayesian time series approach. Journal of the American Statistical Association, 91, 1423-1431. [22] Ghosh, M. and Rao, J.N.K. (1994): Small area estimation: An appraisal (with discussion). Statistical Science, 9, 55-93. [23] Hardy, M.A. (1993): Regression with dummy variables, Sage University Paper series on Quantitative Applications in the Social Sciences, 07-093, Newbury Park, CA: Sage. [24] Istat (2006): La povertà relativa in Italia nel 2005: Statistiche in breve – Famiglia e Società, 11 Ottobre. [25] Kackar, R.N. and Harville, D.A. (1984): Approximations for standard errors of estimators for fixed and random effects in mixed models. Journal of the American Statistical Association, 79, 853-862. [26] Kish, L. (1965): Survey Sampling. New York: Wiley. [27] Kish, L. (1989): Sampling methods for agriculture surveys. Statistical Development Series, 3, Rome: FAO. [28] Kish, L. (1990): Rolling samples and censuses. Survey Methodology, 16, 63-71. [29] Kish, L. (1999): Cumulating/Combining population surveys. Survey Methodology, 25, 129-138. Estimating Poverty in the Italian Provinces… 69 [30] Lemmi, A., Pannuzi, N., Mazzoli, B., Cheli, B., and Betti, G. (1997): Misure di Povertà Multidimensionali e Relative: il Caso dell’Italia nella Prima Metà degli Anni ’90. In C. Quintano (Ed): Scritti di Statistica Economica 3, Quaderni di discussione 13, Istituto di Statistica e Matematica, Istituto Universitario Navale, Napoli. [31] Marker, D.A. (2001): Producing small area estimates from national surveys: Methods for minimizing use of indirect estimators. Survey Methodology, 27, 183-188. [32] Mastrovita, S., Nicoletta, P., and Vignani, D. (2003): Povertà ed Esclusione Sociale: i Profili del Disagio. Rivista di Statistica Ufficiale, 2. Istat. [33] Prasad, N. and Rao, J.N.K. (1990): The estimation of the mean-squared error of small area estimators. Journal of the American Statistical Association, 85, 163-171. [34] Pratesi, M. and Salvati, N. (2005): Small area estimation: the EBLUP estimator with autoregressive random area effects. Working Paper. Pisa: Dipartimento di Statistica e Matematica Applicata all’Economia, Università degli Studi. [35] Petrucci, A. and Salvati, N. (2004): Small area estimation considering spatially correlated errors: The unit level random rffects model. Working Paper. Firenze: Dipartimento di Statistica G. Parenti, Università degli Studi. [36] Petrucci, A., Pratesi, M., and Salvati, N. (2005). Geographic information in small area eEstimation: Small area models and spatially correlated tandom area effects. Statistics in Transition, 7, 609-623. [37] Petrucci, A. and Salvati, N. (2005): Small area rstimation for spatial correlation in watershed erosion assessment. Journal of Agricultural, Biological and Environmental Statistics. [38] Quintano, C., Castellano, R., and Regoli, A. (2003): The effect of the income imputation on poverty measurement: The approach of nonparametric bounds. JSM 2003 Proceedings. San Francisco, California. August 3-7. [39] Quintano, C., Castellano, R., and D’Agostino, A. (2004): The incidence of poverty in Italy: A comparison of three statistical methods. In C. Quintano (Ed): Scritti di Statistica Economica 10, Quaderni di discussione 24, Istituto di Statistica e Matematica, Università degli Studi di Napoli “Parthenope”, Napoli. [40] Rao, J.N.K. (2005): Inferential issues in small area estimation: Some new developments. Statistics in Transition, 7, 513-526. [41] Rao, J.N.K. (2003): Small Area Estimation. London: Wiley. [42] Rao, J.N.K. and Choudry, G.H. (1995): Small area estimation: Overview and empirical study. In B.G. Cox et al. (Eds): Business Survey Methods. New York: John Wiley & Sons. 70 Claudio Quintano, Rosalia Castellano, and Gennaro Punzo [43] Rao, J.N.K. and Yu, M. (1992): Small area estimation by combining time series and cross-sectional data. Canadian Journal of Statistics, 22, 511-528. [44] Sen, A. (1976): Poverty: An ordinal approach to measurement. Econometrica, 44, 219-231. [45] Singh, B.B., Shukla, G.K., and Kundu D. (2005): Spatio-temporal models in small area estimation. Survey Methodology, 31, 183-195. [46] Teekens, R. and Zaidi, M. (1989): Relative and absolute poverty in the European community: Results from family budget surveys. Workshop “Poverty Statistics in European Community”, Noordwijk. [47] Verma, V. and Thanh, L. (1996): An analysis of sampling errors for the demographic and health surveys. International Statistical Review, 64, 265-294. [48] Verma, V. (2002): Comparability in international survey statistics. International Conference on Improving Surveys, Copenhagen, 25-28 August. [49] Verma, V. (2004): Sampling errors and design effects for poverty measures and other complex statistics. VII Convegno Internazionale, Metodi Quantitativi per le Scienze Applicate, Università di Siena. [50] Verma, V. and Betti, G. (2005): Sampling errors and design effects for poverty measures and other complex statistics. Working Paper, 53. Siena: Dipartimento di Metodi Quantitativi, Università degli Studi. [51] Verma, V., Betti, G., and Natilli, M. (2005): Indicators of social exclusion and poverty in Europe’s regions. Working Paper. Siena: Dipartimento di Metodi Quantitativi, Università degli Studi. [52] Verma, V. (1993): Sampling errors in household surveys. National Household Survey Capability Programme. Statistical Division, United Nations, New York. [53] Verma, V., Scott, C., and O’Muircheartaigh C. (1980): Sample designs and sampling errors for the World Fertility Survey. Journal of the Royal Statistical Society, 143, 431-473. [54] Whelan, C.T., Layte, R., Maitre, B., and Nolan, B. (2001): Income, deprivation and economic strain: An analysis of the European community household panel. European Sociological Review, 17, 357-372. [55] Zadeh, L. H. (1965): Fuzzy sets. Information and Control, 8, 338-353. [56] Zheng, B. (2001): Statistical inference for poverty measures with relative poverty lines. Journal of Econometrics, 101, 337-356.