Informatica 32 (2008) 133-145 133 Methodology for the Estimation of Annual Population Stocks by Citizenship Group, Age and Sex in the EU and EFTA Countries Jakub Bijak and Dorota Kupiszewska Central European Forum for Migration and Population Research (CEFMR) ul. Twarda 51/55, 00-818 Warsaw, Poland E-mail: j.bijak@cefmr.pan.pl, d.kupisz@cefmr.pan.pl www.cefmr.pan.pl Keywords: population estimates, stocks by citizenship, Europe, missing data, cohort-wise methods, fitting methods Received: March 9, 2008 The paper addresses selected computational issues related to the challenge of dealing with poor statistics on international migration. Partial results of the on-going Eurostat-funded project on "Modelling of statistical data on migration and migrant population" (MIMOSA) are presented. The focus is on the data on population stocks by broad group of citizenship, sex and age. After a brief overview of the main problems with data on population by citizenship for 31 European countries (27 European Union countries, Iceland, Liechtenstein, Norway and Switzerland), a range of computational methods is proposed including cohort-wise interpolation, cohort-component projections, cohort-wise weights propagation and proportional fitting methods, as well as other, auxiliary methods. The algorithm for choosing the best method for estimating missing data on population stock by broad citizenship (nationals, foreigners - EU27 citizens, foreigners - non EU27 citizens), five-year age group (up to 85+) and sex on 1st January 2002-2006 is proposed and illustrated by examples of its application for selected countries. Povzetek: Opisane so različne metode za ocenjevanje demografskih podatkov. 1 Introduction The deficiencies of statistical information on migration-related variables, such as population flows or stocks, are well-known and widely discussed in the literature [1, 7]. The aim of the paper is to contribute to the works on dealing with these shortcomings and to propose a set of computational methods, as well as an algorithmic procedure of selecting the best one, for the estimation of population stocks as of 1st January in a breakdown by sex, age group and broad citizenship category, for the countries for which information is unavailable or incomplete. The study was carried out within a Eurostat-funded project on "Modelling of statistical data on migration and migrant population" (MIMOSA). It covers 31 European countries, of which 27 belong to the European Union (as per 1st January 2007), and further four - to the EFTA (Iceland, Liechtenstein, Norway and Switzerland). The period of interest is 2002-2006. The citizenship groups under study are: nationals, European Union (EU27) foreigners and non-EU27 foreigners, while the age groups are five-year, with the last, open-ended category being 85 years or more. Generally, the proposed estimation methods aim to combine data from different sources (population census, vital statistics, data on acquisition of citizenship, specific surveys, etc.). In principle, the data that are already available are not modified (for example, in order to harmonise definitions, or for any other reason), unless in the case of inconsistencies between the sources. In the latter cases, the demographic data, provided to Eurostat by national statistical institutes (NSIs), are given priority. Apart from the Introduction, the paper is structured in four sections. Section 2 contains summary information on the availability and quality of the 2002-2006 data on population stocks for 31 countries under study. In Section 3, the proposed methodology for estimating population stocks by sex, age and citizenship groups is discussed. This section presents such tools as estimation of data in single years of age from five-year age-groups, cohort-wise interpolation of population stocks, cohort-component projections, cohort-wise propagation of weights, proportional fitting, as well as other, auxiliary methods. Subsequently, Section 4 contains recommendations concerning the procedure of selecting appropriate estimation methods for each of the countries under study, presented in the form of a decision algorithm and accompanied by several illustrative examples for selected countries. The discussion is summarised in Section 5. The study is based on the data available in the Eurostat databases, supplemented by additional information obtained from national statistical institutes, whenever required and feasible. Throughout the paper, the abbreviation 'NSI' is used to denote the national statistical institute of the respective country, 'JMQ' stands for the Joint Questionnaire on International Migration Statistics (hereafter: Joint Migration Questionnaire) of Eurostat, UN Statistical Division, UN Economic Commission for 134 Informatica 32 (2008) 133-145 J. Bijak et al. Europe, the Council of Europe and the International Labour Office. 'LFS' depicts the Labour Force Survey. 2 Availability of the 2002-2006 data on population stocks for 31 European countries Annual statistics on usually resident population by citizenship, sex and age are collected by Eurostat from the NSIs via the Joint Questionnaire on International Migration, together with migration flow data. Population statistics for 37 European countries, collected through the JMQ are checked and subsequently loaded into Euro-stat's on-line database, NewCronos. The data are located under the Population and Social conditions theme, in the International Migration and Asylum domain (MIGR), tables migrst_popctz (population by sex and citizenship) and migr_st_popage (population by age group, citizenship and sex). The data for 2000-2006 come from the following tables in the 2000-2006 JMQs: ■ Table 7a (for 2000-2003, 8a): Usually resident population by citizenship and age, both sexes; ■ Table 7b (for 2000-2003, 8b): Usually resident population by citizenship and age, males; ■ Table 7c (for 2000-2003, 8c): Usually resident population by citizenship and age, females. A detailed analysis of statistics on population stocks by citizenship provided by the 31 countries covered the JMQs for the reference period 2002-2006. Selected results of the analysis of the data availability for particular countries are summarised in Table 1, providing an overview of the situation for all 31 countries. The information on the lack of data, marked as 'not available' in Table 1, was based on the information provided in the JMQ or on information obtained during the THESIM project1. Other missing data were marked as 'not provided to Eurostat'. In addition to missing data, a number of other problems were detected, for example the presence of provisional data, some citizenship categories only, broad age groups, or a different reference date than 1st January. Data on total population stock on 1st January, not disaggregated by citizenship, are also collected by Eurostat within the framework of the Annual Demographic Statistics data collection. These data, disaggregated by sex and age, are located under the Population and social conditions theme, in the Demography (DEMO) domain of the database, table demo_pjan. The results of the review of the availability of these data for the years 20022006 revealed that the data on total population stock by sex and five-year age group (up to 85+) are available for all 31 countries, with the following exceptions: there is no 2006 data by age for Belgium and Italy, while for Romania the highest age group in 2004 data is 80+. 1 Research project THESIM: Towards Harmonised European Statistics on International Migration, funded by the European Commission through the Sixth Framework Programme and executed by a research consortium led by Groupe d'étude de démographie appliquée (GéDAP), Université Catholique de Louvain. In addition to the annual data, Eurostat also collects and disseminates statistics on population by citizenship, sex and age obtained by the countries during population censuses. Like other statistics, the census data are located under the Population and social conditions theme, in the Census (CENS) domain of the database, table censnsctz. Unlike annual population figures, the census data on population by citizenship, sex and age are available for almost all 31 countries, with the notable exceptions of the United Kingdom, Germany and Malta. A supplementary source of data on population stock by citizenship is the Labour Force Survey. However, the availability of data from the LFS in the Eurostat database is very limited and the reliability of data is probably not high, due to the nature of the data source. By definition, the LFS statistics are estimates and thus bear certain errors, which can be relatively high for disaggregated categories (e.g., for population broken down by age, sex and citizenship groups). However, some use of the LFS data could be considered as an alternative to the proposed methods in the countries where data on total nationals and total foreigners are missing. In the Eurostat database, the LFS tables are located under the Population and social conditions theme, the Labour market (LABOUR) domain, in the table with population data containing the nationality dimension (population by sex, age groups, nationality and labour status, table lfsa_pganws). However, the table does not contain data on the level of individual countries of citizenship and only data on total population and on nationals could be useful for this project. Estimates of the 2002-2006 stock of the EU27 citizens cannot be prepared using the LFS tables in the Eurostat database. These considerations need to be taken into account when proposing computation methods for the current study. 3 Proposed methods of estimating population stocks by citizenship, sex and age The current section presents a theoretical background of the methods proposed for the calculations of the missing elements in population stocks by age, sex and citizenship group. After a brief summary of the notation, the following methods are discussed: interpolation of five-year into one-year age groups, regarded as a preparatory method (Section 3.2), followed by cohort-wise interpolation of population stocks (3.3), cohort-component projections, traditionally used in demography (3.4) and cohort-wise weights propagation (3.5). Further, Section 3.6 describes selected proportional fitting methods, which category encompasses three approaches, depending on the availability of information: the proportional adjustment, direct proportional fitting and iterative proportional fitting. Section 3 concludes by presenting some auxiliary methods for dealing with the Unknown categories, and for the estimation of missing elements of age distributions (3.7). METHODOLOGY FOR THE ESTIMATION OF ANNUAL.. Informatica 32 (2008) 133-145 143 Country 2002 2003 2004 2005 2006 Austria + + + + - Belgium + x - - - Bulgaria ----- Cyprus dref - - - - Czech Republic + + + + + Denmark + + + + + Estonia na na na na - Finland + + + + + France - - - - - Germany for, broad age, ±agesex, i ±age, i for, i i p, i Greece - - i for, ±sex for Hungary for for + + + Ireland p, ±ctz, broad age, dref p, ±ctz, broad age, dref p, ±ctz, broad age, dref p, ±ctz, broad age, dref p, ±ctz, broad age, dref Italy dref - - -age -age Latvia -age + + + + Lithuania - -ctz -ctz + + Luxembourg - - tot, nat ±ctz, ±age, ±sex ±ctz, ±age, ±sex Malta - - - - - Netherlands + + + + + Poland dref - - -ctz - Portugal p, for, -age p, for - - - Romania dref - + + + Slovakia - - for i i Slovenia + + + + + Spain + - p + + Sweden + + + + + United Kingdom - ±ctz, dref ±ctz, dref, a70 ±ctz, broad age, dref - Iceland + + - - - Lichtenstein - - - - - Norway + + + + + Switzerland + + + + + + data provided to Eurostat; - data not provided to Eurostat; -age no disaggregation by age; -ctz no disaggregation by citizenship; ±age age disaggregation only for a few citizenship categories; ±agesex disaggregation by age not provided for Males and Females; ±ctz data provided for a few citizenship categories; ±sex disaggregation by sex provided for a few citizenship categories only; a70 age provided only until 70 years, with the open-ended group 70+; broad age data disaggregated by broad age groups; dref reference date different than 1st January; for data provided for foreigners only; i data inconsistency problems; na data not available; nat data provided for nationals; p provisional data; x problems detected in the data sent by the NSI, tot data provided for Total. Table 1: Availability of data on population stock by citizenship, sex and age in the JMQ, 31 countries, as of 1st January 2002-2006. 134 Informatica 32 (2008) 133-145 J. Bijak et al. 3.1 Notation and basic concepts Throughout the paper, the notation used for population variables follows a common convention presented below. In all cases, the superscript n indicates one of the three broad groups of citizenship: nationals, EU27 foreigners or non-EU27 foreigners, abbreviated as N, EU and nEU, respectively, thus reflecting the composition of the European Union as of 1st January 2007. The non-EU27 group includes also the stateless persons. An abbreviation FOR is used for all foreign population (EU27 and non-EU27 together). For the transparency of presentation, the country index is skipped, as all calculations proposed in the paper are always country-specific. The variables in question are as follows: Stock variables: Pn(x, t) - Population in broad citizenship group n, in the age of x years on 1st January, year t. Pn(x, c) - Population in broad citizenship group n, in the age of x years at the census date c. Event variables: Bn(t) - Births during calendar year t in citizenship group n; Dn(x, t) - Deaths of persons aged x years, belonging to citizenship group n, during calendar year t; In(x, t) - Registered immigration of persons in citizenship group n, aged x years, during calendar year t, regardless of the country of origin; En(x, t) - Registered emigration of persons in citizenship group n, aged x years, during calendar year t, regardless of the country of destination; Rn(x, t) - Outcome of the regularisation of the status of formerly irregular residents (cf. [4]) aged x, in year t, by definition referring only to foreigners, n e {EU, nEU}, thus with RN(x, t) = 0; Sn(x, t) - Statistical adjustment (official correction) of the size of population aged x, in year t, due to the reasons other than regularisations; An(x, t) - Acquisitions of citizenship by the population aged x, in year t, by definition referring only to foreigners, n e {EU, nEU}, withAN(x, t) = 0. Unless noted otherwise, age is reported in years reached during a given calendar year, and thus the events in question (deaths, migrations, citizenship changes, etc.) correspond to parallelograms with vertical sides on the Lexis diagram. An illustration of the relevant concepts on a Lexis plane is shown in Figure 2, in Section 3.4. Whenever necessary, the index denoting sex is added as an additional subscript g e {m, f} for males and females, respectively, e.g. P/(x, t) refers to female population stock, and Dmn(x, t) to deaths among males. In order to distinguish five-year age groups, an additional left-hand side subscript '5' is added. For example, sPmn(x, t) refers to male population belonging to broad citizenship group n which was in the age of [x, x+5) years on 1st January of year t. The same principle applies to almost all event variables (D, I, E, R, S and A ), with a clear exception of B. In some instances, for the clarity of presentation, the summation of a particular variable over a given index is indicated by an asterisk in a respective place, e.g. AnEU(*, t) = Sx AnEU(x, t) refers to all acquisitions of citizenship by non-EU27 foreigners in year t, regardless of age. Similarly, I*(x, t) = Sn In(x, t) denotes all immigrants aged x, in year t, irrespective of their citizenship, and D*(*, t) = Sn Sx Dn(x, t) refers to all deaths registered in year t, without respect to nationality or age. It has to be noted that in several cases the summation over n involves only two components, e.g. n e {EU, nEU} for Rn(x, t) and An(x, t). 3.2 Interpolation of five-year age groups into one-year groups Among the preparatory steps for the estimation of missing data, the most frequent problem concerns disaggrega-tion of five-year age groups of population (or events) into single years. This has to be performed in order to enable cohort-wise interpolations, cohort-component projections with yearly steps, or cohort-wise weights propagation, as described in Sections 3.3, 3.4 and 3.5. If auxiliary information is available from a different source (e.g. from a census, from the previous or next year, etc.), the population size or the number of events can be disaggregated using a 'Prorating' method [11, p. 5-61], whereby the relative distribution from the auxiliary source is imposed on the data in question. The obtained distribution might need to be further adjusted to marginal totals, by means of proportional fitting procedures, described in Section 3.6. If the data on population stocks by sex, broad citizenship group and five-year age group 5Pn(x, t) are available and the stocks by sex and one-year age group P*(x, t) are also known, then, assuming no other information about the distribution by single years, we can estimate the missing distributions for particular citizenship groups proportionally, that is as: Pn(x+i, t) = 5Pn(x, t) • P*(x+i, t) / / 5P*(x, t). This is an example of the application of the direct proportional fitting described in Section 3.6.2. If none of the above information is available, the proposed methodology is to use the well-known interpo-lative four-term third-difference solution of Karup and King [11, p. 5-65]. For each five-year group, the disag-gregation into fifths is done via applying multiplicative coefficients to the global value of this group and the neighbouring ones. Different multipliers are used for the first group, the middle groups and the last group, as set forth in Table 2. For example, if we want to split a middle five-year group with population Nt into five single-year groups n1, n2, n3, n4, n5, then: n1= 0.064 Ni-1 + 0.152 Ni - 0.016 Ni+1 , n2= 0.008 Ni-1 + 0.224 Ni - 0.032 Nm , etc. When Karup-King multipliers are used, the condition Ni = n1 + n2 + n3 + n4 + n5 is automatically fulfilled. METHODOLOGY FOR THE ESTIMATION OF ANNUAL.. Informatica 32 (2008) 133-145 143 First group, N0 Middle groups, N, Last group, NK _No_N_N_N-_Nl_Ni+1_Nk-2 Nk-i_Nk First fifth +0.344 -0.208 +0.064 +0.064 +0.152 -0.016 -0.016 +0.112 +0.104 Second fifth +0.248 -0.056 +0.008 +0.008 +0.224 -0.032 -0.032 +0.104 +0.128 Third fifth +0.176 +0.048 -0.024 -0.024 +0.248 -0.024 -0.024 +0.048 +0.176 Fourth fifth +0.128 +0.104 -0.032 -0.032 +0.224 +0.008 +0.008 -0.056 +0.248 Last fifth +0.104 +0.112 -0.016 -0.016 +0.152 +0.064 +0.064 -0.208 +0.344 Source: [11], Table C-1, p. 5-69. Table 2: Coefficients for the Karup-King interpolation formula. As an alternative to the Karup-King interpolation, the six-term fifth-difference interpolative formulae of Sprague or Beers can be applied, which however use information from more surrounding groups. Methodological details can be found in Shryock et al. [11]. In our case, the Karup-King interpolation is recommended for the sake of simplicity. For variables depicting non-vital events, like migration or citizenship acquisitions, the estimates for particular cohorts can be obtained from two neighbouring period-age estimates yielded by the Karup-King formula, split equally by half. For the first cohort, we can assume that a half of the relevant period-age events concern the cohort in question, while for the last, open-ended cohort, we can add up the period-age estimate for the open-ended group and a half of the events concerning the age group immediately preceding the last one. The underlying rationale is an assumption that non-vital events are equally spread over the period-age squares of the Lexis diagram. In any case, the estimates for the eldest cohorts would be close to zero for all practical migration-related applications. Regardless of the method, if the disaggregation is performed on data broken down by sex or citizenship, the final estimate might need to be obtained by proportional fitting methods (described in Section 3.6), in order to ensure the summation to available marginal totals. 3.3 Cohort-wise interpolation of population stocks Given the information on the age structures of the population for two non-adjacent moments of time, a simple idea to obtain the missing figures for in-between moments would be to apply interpolation techniques. In this case, we propose cohort-wise interpolation for all cohorts apart from the youngest and oldest one, which are discussed separately. Overall, this method requires much less information on input than the cohort-component projections presented in the next section, but it requires information about population both before and after the moment for which the estimates are to be done. The in-terpolative approach is recommended for the cases where (a) the span between two points with available data is not wide (say, two-three years), and (b) no information on the distribution of vital and migratory events by citizenship is available. In practical applications, as the ones described in Section 4, it often happens that the data are available for year t from the census conducted at time c (t < c < t+1), and for 1st January of the year t+k, not immediately following the census. Such situations can be put in a general framework illustrated on a Lexis diagram in Figure 1, where a denotes the fraction of a year remaining after the census until 31st December. Figure 1 and the methodology proposed below cover also the situations when data come from other sources than the census, and the situations when the reference date of the data for year t is 1st January. In the latter case it suffices to set a = 1. For the cohorts already existent at the census date c, the interpolation can follow various patterns, but an arithmetic and geometric pattern of growth [3, 10] will be the most frequent choices. As noted by Rowland [10, p. 50], "under arithmetic growth, successive population totals differ from one another by a constant amount [, while] under geometric growth they differ by a constant ratio". For short-period interpolations, both approaches should yield similar results, although this is an empirical issue, and there is no convincing argument in favour of either of them. Hence, a selection of appropriate methods should rely on case-specific judgements. Age (1-a) a Source: Own elaboration. Figure 1: Cohort-wise interpolation of population stocks: a general idea. 134 Informatica 32 (2008) 133-145 J. Bijak et al. It has to be noted that the cohort aged x completed years on 1st January t+k was split at the census date between two age groups: the younger one (aged x completed years) and the older (aged x+1), as shown in Figure 1. Therefore, the interpolative estimate of Pn(x, t+i) depends on Pn(x, c), Pn(x+1, c) and Pn(x, t+k). Given the above, the formula for an interpolative estimate of population sizes belonging to a particular age group x+i and citizenship group n, assuming the linear pattern of change, is as follows: Pn(x+i, t+i) = (k-i) / (k-1+a) ■ [a ■ Pn(x, c) + (1-a) ■ ■ Pn(x+1, c)] + (i-1+a) / (k-1+a) ■ Pn(x+k, t+k), (1a) while for the geometric change: Pn(x+i, t+i) = {[a ■ Pn(x, c) + (1-a) ■ Pn(x+1, c)]k-i■ ■ Pn(x+k, t+k),-1+a}1 / (k-1+a). (1b) For the youngest and oldest cohorts, for which interpolation as proposed above is not possible, a simplified solution is proposed. In such cases, we suggest to take the average shares (proportions) of the sizes of the respective age groups in the total population, calculated from the data available for neighbouring periods, weighted by the distance between the available data points and estimation point. In order to ensure consistency of the results and summation of the age-specific estimates to the marginal totals by sex or citizenship group, whenever available, the estimates have to be adjusted by the means of proportional fitting, presented in Section 3.6. The framework presented above can be easily generalised to a much less frequent situation with interpolation between two censuses - in such case, a fraction p of a year between the 1st January of the year of the second census and the second census date, c', should be additionally accounted for. However, the estimates obtained in such cases would be only very approximate, due to a usually large time span between the censuses. It should be noted that an identical solution as shown above in (1a), or in (1b) can be used for extrapolating cohort sizes beyond the available data points, in whichever direction. In either case, it would suffice to put an appropriate integer i < 0 for the backward extrapolation (in particular, following the example from Figure 1, set i = 0 to obtain values for the beginning of the census year), or i > k for the forward extrapolation. The methods discussed above resemble to some extent the ones presented in the Human Mortality Database Methods Protocol [15], with the exception of the oldest age groups, where the quoted study suggests more sophisticated extinct cohort and survivor ratios approaches. Direct application of the methods proposed by Wilmoth et al. [15] would be, however, difficult. This is not because of computational reasons, but rather due to the lack of yearly estimates of deaths, births and migratory events broken down by citizenship groups, which has been listed at the beginning of the current section as a precondition for selecting cohort-wise interpolation method. 3.4 Cohort-component projections As concerns projections, let us denote by Xn(x, t) a sum of all event variables not related to the natural change of population stocks (i.e. all but births and deaths), thus: X(x, t) = In(x, t) - En(x, t) + S"(x, t) + {eu, nEU} Ak(x, t), for n = N; (2a) Xn(x, t) = In(x, t) - En(x, t) + Sn(x, t) + Rn(x, t) - An(x, t), for n N. (2b) Given (2a) and (2b), the population accounting equations for each broad citizenship group are: Pn(0, t+1) = Bn(t) - Dn(0, t) + Xn(0, t); (3a) Pn(x, t+1) = Pn(x-1, t) - Dn(x, t) + Xn(x, t), for x e {1, 2, ..., xmax-1}; (3b) Pn(xmax, t+1) = [Pn(xmax-1, t) + Pn(xmax, t)] - Dn(xmax, t) + +XXxmax, t). (3c) In (3c), xmax stands for the highest (open-ended) age group for which information is available. Note also that deaths and other event variables in age group xmax refer to the trapezoid on the Lexis diagram rather than to a parallelogram, while for age group 0 - to a right triangle, as shown in Figure 2. Age O -Tl--Tl xmax 1 1.01. t-1 1.01. t 1.01. t+1 Year Age x+1 -71-71-71 x x-1 1 0 IZ_(0, t) IX_ 1.01. t-1 1.01. t 1.01. t+1 Year Source: Own elaboration. Figure 2: Relationships between population stocks Pn, and events Bn, Dn and Xn on a Lexis diagram. n / P (xmax, t) s D (xmax, t) y Pn(xmx, t+1) / n / P (xmax-1, t) y 01. t-1 1.01. t 1.01. t+1 / /—Dn(x, t) y Pn(x, t+1) / / Pn(x-1, t) / / Bn(t) yD"(0, t) X+xn(0, t) y Pn(0, t+1) X 01. t-1 1.01. t 1.01. t+1 METHODOLOGY FOR THE ESTIMATION OF ANNUAL.. Informatica 32 (2008) 133-145 143 Under the assumptions presented above, the projection is made following the equations (3a), (3b) and (3c) for consecutive years, on the basis of information available for single-year age groups, decomposed from the five-year groups, if needed. Note that the default citizenship of a newborn child can differ between the countries, either following the ius soli principle, whereby a child acquires the citizenship of the country of birth, or ius sanguinis, according to which a child inherits the citizenship of its parent(s), or finally a mixture of those two, for example differentiating between the generations of migrants, taking into account the length of stay in the country, etc. The general rules are as follows: a) Ius sanguinis If the child gets citizenship of any of the parents, then Bn(t) in equation (3 a) may be assumed to be roughly proportional to Pn(t). If the child acquires citizenship of the mother and we have no separate estimate of fertility for nationals and foreigners, then Bn(t) may be assumed to be roughly proportional to Pfn(t). If the estimates of fertility by broad citizenship and age of mother exist then a better estimate may be obtained using the formula: Bn(t) = B*(t) Sx fn(x) Pf n(x, t) / SK x fk(x) Pfk(x, t), (4) first and the last one in each year. In particular, this situation applies if the following four conditions hold: 1. Total population by age, P*(x, t), is known for successive years, but the citizenship structure is missing; 2. We may assume that the distribution of deaths and migration flows by broad citizenship is the same as the citizenship composition of the population; 3. Acquisitions of citizenship may be ignored; 4. There was no regularization, or it may be ignored. In such cases, the projection equation (3b) combined with proportional fitting is equivalent to proportional decomposition of 5P*(x, t) by citizenship group described in Section 3.6.1. The estimations can be performed using the formula: 5Pn(x, t) = 5P (x, t) ■ sPn(x-1, t-1) / 5P (x-1, t-1). (5) The first and the last cohort may be disaggregated using the citizenship composition of the first and last age group in the previous year. In such cases, the following formulas apply: 5Pn(0, t) = 5P (0, t) ■ 5Pn(0, t-1) / P (0, t-1), or: t) = 5P C^mE» t) ' 5Pn(xmax, t-1) / 5P ( xm 5Pn(x (6a) t-1). (6b) where f n(x) denotes age-specific fertility rates for women in age group x, belonging to the group of citizenship n. If the estimates of fertility are available by broad citizenship group, but not by the age of mother, the formula (4) would have to be modified, so as the summation over age reflects only the female population aged 15-49 years. b) Ius soli If the child automatically acquires the citizenship of a given country, then the balance equation for the youngest age group, (3a), becomes, depending on the citizenship in question: PN(0, t+1) = B*(t) - DN(0, t) + XN(0, t), for n = N; (3a') Pn(0, t+1) = Xn(0, t) - Dn(0, t), for n ± N. (3a'') In mixed cases, it is recommended to project one part of births according to formulas for ius soli and another part according to the ius sanguinis principle. Note also that losses of citizenship are not accounted for, as they in most instances concern persons in reality either already living abroad, or emigrating (and counted in E). For acquisitions of citizenship, we assume that non-nationals fall in the category of nationals upon naturalization, in order to count the same people only once, regardless of the number of citizenships they have. If the breakdown by citizenship group of all variables referring to vital and migratory events can be assumed proportional to the citizenship structure of the population at the beginning of each year, then the projection methodology can be often de facto simplified to proportional adjustment / decomposition, whereby the citizenship distribution of the considered cohort in the previous year would directly apply to all cohorts except the 3.5 Cohort-wise weights propagation In some cases, too much information on the age-sex-citizenship distribution of the components of population change is missing, which renders projections too dubious with respect to the number of assumptions that need to be made. In practice, in such instances the only reliable information comes from the population census and from annual population stocks available in the DEMO domain of the NewCronos database. Hence, the proposed procedure is as follows. For the census population, apply the structure by citizenship, taken from each five-year age group, to the respective single-year age groups (i.e. from age group 04 to single ages 0, 1, ..., 4; from 5-9 to 5, 6, ..., 9 etc.). Let wn(x, c) = Pn(x, c) / P(x, c) denote the age-specific shares ('weights') of citizenship group n in the census. Further, set a as a fraction of the calendar year before the census date. It is implicitly assumed that the census population in single-year age groups can be divided between 'older' and 'younger' cohorts using the a and (1-a) partition. For the census date, use the following formula to calculate the share of citizenship group n in the cohort that was aged x years on 1st January of the census year: wn(x+a, c) = [(1- a) ■ Pn(x, c) + a ■ Pn(x+1, c)] / / [(1- a) ■ P*(x, c) + a ■ P*(x+1, c)], for x < xmax; (7a) wn(xmax+a, c) = Pn(xmax, c) / P*(xmax, c). (7b) For the 1st January of the census year assume that the weights wn(x, t) = wn(x+a, c). For the 1st January of the year following the census year (t > c), assume in turn: wn(x, t) = wn(x-1+a, c), for 0 < x 1). The starting value k = 1 defines also the initial estimate of the joint sex-age-citizenship distribution, Pg(1) n(x, t) = Pg' n(x, t). Subsequent steps are computed as follows: Pg(2k) n(x, t) = Pg(2k-1) n(x, t) ■ Pg*(x, t) / Pg(2k-1) *(x, t); (12a) Pg(2k+1) n(x, t) = Pg(2k) n(x, t) ■ P.n(*, t) / P. (2k) n(*, t). (12b) The procedure defined by (12a) and (12b) is repeated iteratively till some convergence criterion is achieved. For example, the estimates yielded by consecutive steps METHODOLOGY FOR THE ESTIMATION OF ANNUAL.. Informatica 32 (2008) 133-145 143 should differ by no more than by an arbitrarily-selected small number s. More details of the method have been discussed by Willekens [13, pp. 69-71], Willekens et al. [14], Rees [8] and Norman [6]. Although the IPF method is purely mechanical, its main advantage is that it does not require any additional information (such as data on vital events or migration) or excessive labour resources, and the obtained results (in terms of joint distributions by all variables under study) are automatically coherent with marginal distributions of particular variables. Moreover, under some general assumptions, the IPF estimates can be interpreted from a statistical viewpoint as joint probability distributions obtained using the maximum likelihood or entropy maximisation methods [2, pp. 83-97; after: 13, p. 70]. 3.7 Auxiliary methods Among the auxiliary methods proposed in the current study, the foremost one is the decomposition of the Unknown category wherever it appears (i.e., with respect to age, citizenship, or even sex, as in the case of Greece for 2005). The universal solution proposed in such cases is a proportional disaggregation: population belonging to the Unknown category is broken down proportionally to the existing, well defined categories (citizenship groups, age groups, etc.) and the resulting parts are attached to these categories. For example, if total population P consists of n well-defined groups Pi, ..., Pn, and the Unknown category, Punk, such that P = I; Pi + Punk, where i = 1, ..., n, then the following corrections apply: P'j = Pj + Punk • Pj / Yi P, = Pj (1 + Punk / Y Pi) , for all j, with i = 1, ..., n. (13) If some elements of age structures are missing (e.g. tails of respective age distributions, or a breakdown into five-year groups given the availability of broader ones), we may either use a structure from a different year or fit a mathematical function to available data. For example, we can assume that foreign population stocks are a double-exponential function of age, as originally proposed for the intensity of migration flows by Rogers and Castro [5, 9]. The number of foreign population aged x,