173
ATLANTI • 27 • 2017 • n. 1
Besides Standards and Automation: an Experience 
with Census Databases
Bogdan-Florin POPOVICI, Ph.D.
National Archives of Romania, Brașov County Division, Brașov, str. G. Barițiu nr. 34, 500025, Brașov, România
e-mail: bogdanpopovici@arhivelenationale.ro
Besides Standards and Automation: an Experience with Census Databases
ABSTRACT
This paper came out of a practical experience, that did not offer the proper prerequisites indicated usually by stan-
dards. Censored databases of censuses, with partial documentation, was transferred to the National Archives and 
remained stored with no attempt to immediately process or make them accessible to the users. When examined, 
many issues and questions arose, both technically and conceptually. We share our experience and our reluctances, 
being aware of the fact that more advanced colleagues (using SIARDS or ADDML, for instance) may regard all as 
these primitive solutions. But, in the same time, we are convinced we are not the only ones lacking proper skills and 
expertise and our examples may be an example for other colleagues. 
Key words: census databases, archival processing
Fra standard ed automazione: un esperienza con i database censuari
SINTESI
Questo intervento è nato da un'esperienza pratica, che non ha fornito i corretti requisiti necessari indicati di solito 
dagli standard. Le banche dati di censimenti, con documentazione parziale, sono state trasferite all'Archivio Nazio-
nale e conservate senza alcun tentativo di elaborarazione immediata o per renderli accessibili agli utenti. Quando 
sono stati esaminati, sono emersi numerose questioni e domande, sia tecniche che concettuali. Si viole qui condivi-
dere questa esperienza e le sue criticità pur consapevolinche i colleghi più esperti (che usano ad esempio SIARDS 
o ADDML) possono considerare primitive tutte queste soluzioni. Allo stesso tempo, però, si è convinti di non es-
sere gli unici a non disporre di competenze ed esperienze adeguate, e che questi esempi possono essere utili ad  altri 
colleghi.
Parole chiave: database censuari, trattamento archivistico
Poleg standardov in avtomatizacije: izkušnje s podatkovno bazo popisa prebivalstva
IZVLEČEK
Ta članek je nastal na osnovi praktičnih izkušenj, ki niso zagotavljale ustreznih predpogojev, ki jih običajno predpi-
sujejo standardi. Cenzurirana podatkovna baza popisa prebivalstva z delno dokumentacijo, je bila prevzeta v držav-
ni arhiv, in je ostala shranjene brez namena pristopa k takojšnji obdelavi ali pripravi gradiva za njegovo uporabo. Ko 
smo pistopili k pregledu podatkovne baze, so se pojavila številna vprašanja, tako tehnična kot konceptualna. V 
prispevku predstavljamo in delimo svoje izkušnje, pri tem pa se zavedamo dejstva, da lahko bolj napredni kolegi (ki 
na primer uporabljajo SIARDS ali ADDML) vse naše rešitve obravnavajo kot primitivne. Toda prepričani smo, da 
nismo edini, ki nimamo ustreznih veščin in strokovnega znanja, zato so lahko naši primeri in izkušnje primer za 
druge kolege.
Ključne besede: podatkovne baze, popis prebivalstva, strokovna obdelava arhivskega gradiva
ATLANTI • 27 • 2017 • n. 1
174
Bogdan-Florin POPOVICI: Besides Standards and Automation: An Experience with Census 
Databases, 173-182
Dincolo de standarde și automatizare. O experiență cu bazele de date ale recensămintelor 
REZUMAT
Prezentul material prezintă experiența prelucrării bazelor de date ale recensămintelor din 2002 și 2011. Pornind de 
la cadrul legal, el însuși problematic, articolul urmărește modul în care datele primite au devenit entități funcțion-
ale, inteligibile pentru utilizator. Sunt descrise de asemenea alte acțiuni conexe ale SJAN Brașov de completare a 
documentației recensămintelor, precum și dilemele și răspunsurile noastre raportat la arhivarea bazelor de date de 
acest tip. 
Cuvinte-cheie: prelucrare arhivistică, baze de date, recensăminte
1 Introduction
In 2002 and 2011 in Romania there were undertaken general censuses of populations and housin-
gs. The National Archives, by its departmental units, were required to consider for permanent preserva-
tion the documentation of these censuses. A certain procedure was followed and electronic data were 
transferred to the National Archives. 
In 2013, we discovered that, apart from their transfer (one may read “ingest”), the files were not 
submitted to any the further archival processing in our institution
1
. The accompanying documentation 
was rather poor and, moreover, one may consider the records transferred as being partial relevant for the 
census. In the following, we shall discuss the topic of appraisal census data, we shall examine the data re-
ceived and the process we performed, with some consideration on archival description of databases.
2 Legal framework
Between 18 and 27 March 2002, Romania undertook a census for population and housings. Ac-
cording to the first regulation on the census, after the publication of final results, the whole series compri-
sing the individual paper forms shall be transfered for preservation to the National Archives and its terri-
torial branches
2
. Six months later, a new government decision was issued, indicating that primary 
information from the data form delivered by surveyed persons will be anonymized, wrote on magnetic 
carrier and they will be preserved by the National Archives or its territorial branches. After publishing the 
final results, the original forms shall be destroyed
3
. 
Between 7 and 16 May 2011, another census was performed, on the same topic. According to a 
Government Decision in 2009
4
, after the publication of final results, the database containing individual 
data shall be transferred for preservation to the National Archives or its territorial branches. A new Go-
vernment Decision from 2011, changed again the provision, indicating that after the publishing of final 
results, the database in electronic form, containing anonymised personal data shall be transferred for 
preservation to the National Archives, while the original forms shall be destroyed
5
. As an extra ingre-
dient, due to the fact the 2011 census was declared “the last paper-based census”, the National Archives 
asked for its branches to take over also the original paper form, as historical relevant documents. In our 
cases, that lead to a juridical litigation, since the local Directorate for Statistics insisted in strictly applica-
tion of the Government Decision, that is, refusing to transfer the paper forms and asking for destroying 
1. Brasov County Division of the National Archives. 
2. Hotărârea nr. 680 din 19 iulie 2001 privind organizarea și desfășurarea recensământului populației și al locuințelor din Ro-
mânia în anul 2002, în Monitorul Oficial, nr. 439 din 6 august 2001, art. 17.
3.. Hotărârea nr. 1505 din 18 decembrie 2002 pentru modificarea Hotărârii Guvernului nr. 680/2001 privind organizarea și 
desfășurarea recensământului populației și al locuințelor din Romania în anul 2002, în Monitorul Oficial, nr. 19 din 15 ianua-
rie 2003.
4. Hotărârea Guvernului nr. 1.502/2009 privind organizarea și desfășurarea recensământului populației și al locuin-
țelor din România în anul 2011, în Monitorul Oficial 860 din 10 decembrie 2009, art. 16.
5.
 
Hotărârea Guvernului nr 922 din 21.09.2011 pentru modificarea și completarea Hotărârii Guvernului nr. 1.502/2009 
privind organizarea și desfășurarea recensământului populației și al locuințelor din România în anul 2011 în Monitorul Oficial, 
nr.689/28.IX.2011, art. 13.
175
ATLANTI • 27 • 2017 • n. 1
Bogdan-Florin POPOVICI: Besides Standards and Automation: An Experience with Census 
Databases, 173-182
the paper originals, on ground of data protection. The local division of the National Archives refuses the 
consent to their disposition, since they were declared of historical value, so the forms are still kept for 
preservation in the creator records centre. 
In both cases, the Brasov County Division of the National Archives received for permanent preser-
vation optical supports (CDs) carrying primary data (micro-data) of the censuses, in an anonymized 
form. In the following, we shall examine the content received, its technical and historically usefulness and 
relevance and we shall describe the archival processing we performed on them. Apart from the transfer 
process, there were no special regulations or indications about the method of archival processing, so the 
following have the character of a study case, with its good, bad and ugly parts. Many of the steps and pro-
cedures may look naïve for those archivists having expertise in dealing with databases preservation. But 
due to circumstances, the lack of professional guidance and expertise, we consider this presentation may 
be of interest to other colleagues, in similar situations. 
3 The ‘objects’
The data of the 2002 census was received on a compact disk, containing 16 files in *.dbf format, 
accompanied by narrative description, providing: names of files, their content and fields coded names, 
types and possible values. There were also list of values, serving presumably as sources for main tables. 
The data was not authenticated in any way, nor at hash/checksum level, nor digital signature.
The data of 2011 was delivered also as on a compact disk, in package digitally signed. The package 
could have been open using a proprietary software belonging the company that issued the digital certifi-
cate (basically, it was a digitally signed and encrypted file *.p7s). Inside the package, there were 40 files 
in *.dbf and *.csv format (based on the file name, the same data in two formats) and files in *.pdf 
format containing a scanned census form, that contained the mapping of the database fields codified 
name and with the field name in clear from the form.
The compact disks remained kept as such until 2016, when it was raised the question about their 
archival processing and possible use. 
ATLANTI • 27 • 2017 • n. 1
176
Bogdan-Florin POPOVICI: Besides Standards and Automation: An Experience with Census 
Databases, 173-182
4 Archival processing. Making objects understandable
In their original form, we had very little information about those data. As can be seen in the pictu-
re 1, opening the data would render tables of raw data, with no meaning whatsoever. Relying on some 
database knowledge, one could suspect some of the figures in the column are, in fact, IDs from the links 
to other tables and not figures with a meaning by themselves. In this regard, for making the data in the 
main tables understandable, the decision was taken to attempt to link the tables, decoding the meaning 
of fields and data. The possibility to keep the files as flat ones was rejected, since it was obviously the usa-
bility for research was better served if there was a possibility to filter / cross-query the various data. 
4.1 Census in 2002
The actions for making “functional” the tables from 2002 census faced some technical difficulties. 
Firstly, the tool chosen for converting tables to a more functional framework was MS Access 2013, as a 
DBMS better known by archivists and having rather complex functionalities. Unfortunately, this version 
of the software did not have the capability to read files in *.dbf format anymore, which require the use 
of another tool, at least for conversion. A free tool was then used, DBF Plus, that could read and export 
the tables from their original dbf files to a tab-separated text file. After converting all tables. They were 
imported into MS Access database. 
177
ATLANTI • 27 • 2017 • n. 1
Bogdan-Florin POPOVICI: Besides Standards and Automation: An Experience with Census 
Databases, 173-182
During the export, the header of the tables was not exported, nor the correct page code. That had 
to be fixed in Access, by manually create a new field header and by setting the correct code page to repro-
duce Romanian regional settings. The coded headers remained as it were in tables, while the explicit me-
aning of the field was only rendered in the form (Figure 2 and 3). Then, field by field was checked for 
possible links with the existent list of values. This operation was, in many aspects, a fortune-teller endea-
vour, since there was no clear description of the source and the destination of data. In order to keep track 
of our intervention, we kept the imported tables untouched and new list of values were clearly indicated 
(Figure 4). In the main table, every field linked was annotated as to indicate the source of data (Figure 2). 
In this way, a user of the Access database is aware which is the “original” (in fact, the migrated/ imported 
tables) and which is the tables refactored by the archivist. 
ATLANTI • 27 • 2017 • n. 1
178
Bogdan-Florin POPOVICI: Besides Standards and Automation: An Experience with Census 
Databases, 173-182
Linking the tables was again a very time consuming task, because one table could have 50-60 fields. 
Another issue was that some of the list of values (LoV) were, in fact merged as values in the main table, 
but separated as source, so a new merged LoV had to be created; basically, we reconstructed the source of 
values, having little evidence this was the original indeed. As if all these actions were not troublesome 
enough, we discovered that some of the fields, described in the explanatory fields as being: “not used”, 
contained in fact positive values, whose meaning remained obscured.
4.2 Census in 2011
Most of the issued encountered when converting tables containing microdata from 2002 census 
were the same for the 2011 one. The data were also contained files with *.dbf format, coded fields na-
mes, values representing data or just pointers to LoV data and so on. The mapping between paper forms 
and table fields was done by the creator by indicating on a digital copy of paper forms the coded names of 
fields. In the process of de-coding names, we were able to find out that not all the fields were transferred 
in the tables, but they were altered by removing (at least) one field containing National ID Number.
Another issue for 2011 census files was the size of the database: due to large amount of informa-
tion, it grew larger than 2 GB, (the limit of *.accdb file), forcing us to use separate databases for various 
tables (inhabitants, housing etc.) and related entities.
After the successful import and creation of tables, it was generated a set of forms and queries, in 
order to ease the reading and searching the data (Figure 3). Since the paper forms are considered a class in 
the Directorate of Statistics filing plans, it comes natural to consider these two databases as series. They 
were described as such in our archival management software-scopeArchiv. 
179
ATLANTI • 27 • 2017 • n. 1
Bogdan-Florin POPOVICI: Besides Standards and Automation: An Experience with Census 
Databases, 173-182
But this archival processing was only a part of the process. As seen, our reality was far from a stan-
dard and proper SIP, metadata were poor enough not to be align to any of the professional ones. Basical-
ly, we received a bucket of files, in various formats, we migrated them and created a tool for access to data. 
We felt the need though to clarify conceptually these entities, their status (original, copy, derivate for 
research etc.) and also their archival nature and value. 
5 Debating issues 
5.1 About received packages
The files we received were produced by the creator. We had absolutely no information about the 
way those *.dbf files were generated, if they were altered (as we suspected), if they were the result of an 
export or copy, nor if they were subject of any other transformation. We requested information from the 
local Directorate of Statistics and, unfortunately, due to the distance in time, we could collect informa-
tion only about the system used for the 2011 census. Due to this request for information, it was revealed 
the application used was a dedicated one, created specifically for the census in 2011. It was designated to 
run on Window XP operating system and it was programmed in Visual Foxpro. *.dbf files were, there-
fore, native, copied for us and not migrated. The architecture was client-server, each county units conso-
lidating the date to a central server. Also, the application had a native export tool, in *csv or *.xls 
format (the alternative files sent to the archives; again, the files transferred seemed to be exported direct-
ly from the application, without other processing). We received no information whatsoever about anony-
mization actions. Since the files for transfer were prepared at central level, it is likely this information was 
not available at county level. 
The files received were supposed to remain unaltered, as a proof of what we received. In fact, at least 
for 2011 census files, the package was encrypted and it could not be open except for using a software from 
the digital signature issuer. This is why we decided to keep as versions the package as-ingested (version 1), 
the package decrypted and compressed (by the creator, using Winrar) (version 2) and the package un-
compressed (version 3). Since we do not have a digital archiving solution, versions of the ingested version 
(“SIP”), the *.txt and *.pdf files migrated (“AIP”) and Access database (“DIP”) were all attached to 
one archival description in scopeArchiv. 
5.2 About archival value of the data
A maybe naïve question may arise concerning the value of data: which is the envisaged user, in what 
consist the archival value of these datasets? At first sight, the answer is clear: these are valuable micro-da-
ta about the people and housing, harvested during census. It is the micro-image of the society, important 
for historians, genealogists and so on. 
This “obvious” picture should be a bit amended. Firstly, we have the proof that the data, at least in 
the case of 2011 census, was amended, anonymized. In this regard, strangely enough, it should be noticed 
that even the legal texts ignore the fact National Archives can protect the personal data, can control the 
release of sensitive information, and must receive original documents. In both census, the grounding legal 
texts require information should be first amended, then transferred to the National Archives. Moreover, 
on the Institute of Statistics, on its website, offers access to anonymized microdata “only for scientific 
research” (see INS-microdate pentru cercetare stiintifica, 2017), quoting European regulations (see 
Commission Regulation 557/2013/CE, 2013; Regulation (CE) nr. 223/2009), fulfilling, in this regard, 
also the mission of the National Archives. In other words, the documents publicly released by the Insti-
tute are, basically, what National Archives received, as another “user” of the data and not as a preserver 
empowered by the State to maintain original, fully accurate evidence of the information created by public 
institutions. Letting aside the institutional misrepresentation, the range of uses in time is heavily affected 
by the intervention and anonymization of data. Regarding retrospectively, the censuses in 18
th
 or 19
th
 
century are very valuable today because, containing “personal data”, it helps genealogists, family histo-
rians etc. to precise identify persons, houses, households and correlate these data for reconstruction of 
past events and situations. In our cases, anonymized data can only be used for statistical purposes, on 
streets, districts or communities, raising the level of information above specific persons. While data pro-
tection is an understandable and necessary measure in today’s society, I think it should be examined the 
ATLANTI • 27 • 2017 • n. 1
180
Bogdan-Florin POPOVICI: Besides Standards and Automation: An Experience with Census 
Databases, 173-182
impact of applying the same protective over historically-relevant documents. We consider that expan-
ding the protection periods for census data may be a sufficient way to guarantee the right of individuals, 
without altering the historical information contained. 
But a more serious question arose in what concern the data themselves. In direct discussions with 
statisticians, they indicated there were significant errors in collecting data by forms and, indirectly, the 
same issue affected the databases. This information was confirmed by official documents. According to 
the Quality Report of the 2011 Population and Housing Census (2011, p. 21-22), the forms data collected 
were rather imprecise (Recensământul Populaţiei şi al Locuinţelor 2011, p. 5-6) and needed indirect sources for 
data harvesting, correlation and corrections (at country level, 5.9% of data were collected from indirect 
sources, while for Brasov county the percentage was 6.1). The final results were generated by collecting 
data from other sources and by applying statistical corrections to the primary data. In other words, pre-
serving by the Archives of databases with primary data, with no further documentation indicating the 
quality of data, the corrections applied and the final results raises serious questions about the quality of 
these data as historical sources. With this perspective in mind, we requested the creator to transfer also 
the files containing the final results, preserving in this way first and last data sets, but, however, with no 
firm guarantees they are indeed what we considered necessary for historical analysis. 
In the same regard, an issue on appraisal was identified. We do not want to have here a review over 
extensive professional literature and ideas, attempting to identify the goals of appraisal, methods and so 
on. The idea we would like to emphasize is that the data can be relevant not only by themselves, but in the 
context they were collected. By context, we mean not only the general framework (that is, the census), 
but also: intentions, methods used, tools involved and so on. Therefore, documenting the census is not only 
about data, but also about the way Institute of Statistics did its job. Receiving only some tables with data 
gives only a slice of reality. In this regard, we requested also for transfer, for both censuses, the manuals for 
census field agents, sector maps, instructions and procedures applied, original forms etc. Despite this ex-
tra-documentation, is should be noticed the website for 2011 census contains more rich and relevant 
documentation than that transferred to the Archives (see Institatul Naţional de Statistică, 2017) and, 
even if we harvested that website, we may consider the National Archives failed to properly preserve the 
relevant documentation for 2011 census (and also, very likely, for 2002). 
6 Lessons learnt. Conclusions
The experience of ingesting digital information about censuses in 2002 and 2011 to the National 
Archives and attempts to preserve them allows for a set of lessons, regarding both technical and archival 
aspects. 
It is necessary for the archivists to be involved in the process of selecting data for preservation. It 
may be self-implicit in theory, but the examples we discussed here showed it is not always the case. For 
many reasons, the access of the archivists to the production system may be restraint and a selection based 
on data-meaning or based on consideration of the producer may not be good enough. The appraisal, in 
the same time, should be based on an extensive analysis of the workflow, in order to identify if the data 
intended to be preserved is indeed the data that would serve the most the users’ interests in the future. 
The content, but also the context of creating those data should be documented, as a way to assess the 
quality of data, understand how data were collected, with what purposes, within which framework. 
Due to the versatility of digital data, it may exist a temptation for the creating agency to manipula-
te the original data, as to apply various protection methods, for personal data, for instance. In this regard, 
we consider the Archives should promote more firmly the need for original (whatever that means in digi-
tal environment) and its capacity/obligation of ensuring the confidentiality-for as long as necessary-of 
the information transferred. Archives should preserve, as we all know, the evidence of the activity, not 
some altered copy of information resulted from that activity. This is why, in our opinion, the control of 
the archivists over the export of data from the working system may be a way to achieve the extraction of 
“original” data. 
At a technical level, for the data have a meaning, it is necessary not only to preserve them, but to 
also have an extensive description of the relationships and constraints between various tables and queries 
and of the original working system. Basically, to identify the “records” which are composed of those 
181
ATLANTI • 27 • 2017 • n. 1
Bogdan-Florin POPOVICI: Besides Standards and Automation: An Experience with Census 
Databases, 173-182
“data”. In practice, the fields and relations may be coded and may not be self-explicit; description should 
make them clearly understandable. Also, the lists of values may be mixed in various way for certain cases, 
so not only “authoritative lists” are necessary, but the sets that are effectively source for the main table(s), 
used for the data to be explicit. If some intervention were made on original data, those should be explicit-
ly indicated, documenting type of data removed and the methods used. All these would reveal not only 
the meaning, but also the technical context of creation/preserving the data. 
In the final line, I would like to challenge the duty of archivist to create systems that would make 
the data understandable for the users (in other words, to create a Digital Information Package). The data 
may come to the Archives from a variety of systems, many of them complex enough to require high skills 
to restore the functionality, which would definitely exceed the competences of an archivist. What if we 
shall take the data and just preserve them, with all the semantic, provenance fixity information requi-
red-and only that? Looking to the past, the archivists preserved medieval charters even if some of them 
did not understand Latin; glass plates were kept even if devices for read the image properly were not 
available. What if we shall deliver to the user a zip package containing tables and documentation about 
them (if exists?). Of course, one of the duty of our profession is to make the holding available to the users. 
Moreover, these days, when low budgeting is a common issue, the higher visibility is one way for col-
lecting more money-so, just keeping stuff without making them available, without promoting them may 
not be a good business. From this point of view, creating DIPs seems more and more like an archival 
professional mission, which emphasize a new dimension for archival management: archivists are no lon-
ger sufficient for processing historical archives. 
References
Commission Regulation 557/2013/CE of 17 June 2013 and its implementation guide.Available at www.insse.ro/
cms/files/eurostat/esds/Reg_EC_557_2013_EN.pdf (last visited 1 April 2017).
Hotărârea Guvernului nr 922 din 21.09.2011 pentru modificarea și completarea Hotărârii Guvernului nr. 1.502/2009 privind 
organizarea și desfășurarea recensământului populației și al locuințelor din România în anul 2011 în Monitorul Oficial, 
nr.689/28.IX.2011, art. 13.
Hotărârea nr. 1505 din 18 decembrie 2002 pentru modificarea Hotărârii Guvernului nr. 680/2001 privind organi-
zarea și desfășurarea recensământului populației și al locuințelor din Romania în anul 2002, în Monitorul Oficial, nr. 
19 din 15 ianuarie 2003.
Hotărârea nr. 680 din 19 iulie 2001 privind organizarea și desfășurarea recensământului populației și al locuințelor 
din România în anul 2002, în Monitorul Oficial, nr. 439 din 6 august 2001, art. 17.
INS-microdate pentru cercetare stiintifica (2017). Available at http://www.insse.ro/cms/ro/content/ins-micro-
date-pentru-cercetare-stiintifica (last visited 1 April 2017).
Institatul Naţional de Statistică, 2017. Available at http://www.recensamantromania.ro/ (last visited 1 April 
2017).
Quality Report of the 2011 Population and Housing Census (2011). Available at http://www.recensamantromania.
ro/wp-content/uploads/2015/02/Raport-de-calitate_RPL2011_ENGLISH.pdf (last visited 1 April 2017).
Recensământul Populaţiei şi al Locuinţelor 2011, p. 5-6. Available at http://www.recensamantromania.ro/wp-content/
uploads/2013/07/prezentare-rpl-2011__Partea_I.pdf (last visited 1 April 2017).
Regulation (EC) nr. 223/2009 of European Parliament and Council from 11 MArch 2009. Available at http://
eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32009R0223&from=EN (last visited 1 April 
2017).
SUMMARY
In 2002 and 2011 in Romania there were undertaken general censuses of populations and housings. The National 
Archives, by its departmental units, were required to consider for permanent preservation the documentation of 
these censuses, in an anonymized form. The databases were archivally processed several years after their accession, 
which raised some issues, since the documentation available about the databases was not enough and it was neces-
sary a contact with key persons from the creator who were aware of the technical context. The paper describes 
operations performed to make out of the plain data (submitted) a fully operational database (disseminated), with 
intelligible data. Further on, the paper approaches the quality of data and their usefulness for research, since the 
ATLANTI • 27 • 2017 • n. 1
182
Bogdan-Florin POPOVICI: Besides Standards and Automation: An Experience with Census 
Databases, 173-182
information was anonymized and the quality of microdata was admitted of being poor even by the collector. A fi-
nal topic approached regards the question of completeness of information about a census, since Archives are kee-
ping the initial data, but nothing about the legal, procedural and institutional context of the census. 
Typology: 1.01 Original Scientific Article
Submitting date: 19.04.2017
Acceptance date: 05.05.2017