145 Stefano Allegrezza1 BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE Abstract Purpose: The obsolescence of file formats presents a significant challenge for the long-term preservation of digital archives. As technology advances, existing formats become outdated, requiring conversion to newer formats to maintain usability. This issue impacts both born-digital and digitized records, often ex- acerbated by vendors who promote new formats through planned obsolescence. Methods: To address file format obsolescence, the general approach of file for- mat conversion is proposed. This involves systematically converting digital re- cords from outdated formats to more current ones, based on a review of interna- tional standards and guidelines. Results: Despite the recognized importance of file format conversion, its practi- cal implementation remains limited. Barriers include insufficient knowledge and expertise, as well as a lack of clear operational guidelines and recommenda- tions, which impedes widespread adoption. Discussion: This paper aims to bridge these gaps by exploring the rationale be- hind format conversion (Why), determining appropriate timings for conversion (When), and detailing methodologies (How) supported by standards and best practices. Through this framework, it seeks to advance practical solutions to the ongoing challenge of digital preservation in the face of format obsolescence. Keywords: Digital archives, File formats, Migration, Conversion, Obsolescence 1 Stefano Allegrezza, Phd. University of Macerata, Italy. E-mail: stefano.allegrezza@unimc.it. BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 146 ARCHIVI NATI DIGITALI E DIGITALIZZATI: AFFRONTARE IL PROBLEMA DELL’OBSOLESCENZA DEI FORMATI DI FILE Abstract Scopo: Il problema dell’obsolescenza dei formati elettronici rappresenta una sfida significativa per la conservazione a lungo termine degli archivi digitali. I continui sviluppi tecnologici rendono i formati esistenti obsoleti, e talvolta i fornitori incoraggiano la transizione a nuovi formati attraverso l’obsolescenza programmata. Questo problema interessa sia i documenti nativi digitali sia quel- li digitalizzati, ottenuti tramite la conversione di documenti analogici in digitale. Metodi: Per affrontare l’obsolescenza dei formati, viene generalmente proposta la strategia della conversione di formato. Questa prevede il riversamento dei documenti digitali dai formati obsoleti a quelli più attuali, basandosi su una re- visione di standard internazionali e linee guida. Risultati: Nonostante l‘importanza riconosciuta della conversione di formato, la sua attuazione pratica è ancora limitata. Le barriere principali includono la carenza di conoscenze e competenze adeguate, insieme alla mancanza di linee guida operative e raccomandazioni chiare, ostacolando una più ampia adozione della pratica. Discussione: Il presente articolo si propone di colmare queste lacune esplorando le motivazioni alla base della conversione di formato (Perché), il momento op- portuno per attuarla (Quando) e le metodologie da seguire (Come), supportate da standard e buone pratiche. Attraverso questo quadro, mira a promuovere so- luzioni pratiche alla sfida continua della conservazione digitale di fronte all’ob- solescenza dei formati. Parole chiave: Archivi Digitali, Formati di File, Migrazione, Conversione, Ob- solescenza BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 147 DIGITALNO IN DIGITALIZIRANO ARHIVSKO GRADIVO: REŠEVANJE PROBLEMA ZASTARELOSTI FORMATOV DATOTEK Izvleček Namen: Zastarelost formatov datotek predstavlja pomemben izziv za dolgoročno hrambo digitalnih arhivov. Ko tehnologija napreduje, obstoječi formati zastara- jo, kar zahteva pretvorbo v novejše formate, da bi ohranili uporabnost. Ta težava vpliva tako na digitalno ustvarjene kot tudi digitalizirane zapise, kar pogosto poslabšajo ponudniki, ki spodbujajo nove formate skozi načrtovano zastarelost. Metode: Za reševanje problema zastarelosti formatov datotek je predlagan splo- šen pristop pretvorbe formatov datotek. To vključuje sistematično pretvorbo digi- talnih zapisov iz zastarelih formatov v novejše, na podlagi pregleda mednarodnih standardov in smernic. Rezultati: Kljub prepoznani pomembnosti pretvorbe formatov datotek je njena praktična izvedba še vedno omejena. Ovire vključujejo pomanjkanje znanja in strokovnosti, pa tudi pomanjkanje jasnih operativnih smernic in priporočil, kar ovira široko sprejetje. Razprava: Namen tega prispevka je zapolniti te vrzeli z raziskovanjem razlo- gov za pretvorbo formatov (Zakaj), določanjem ustreznih časovnih okvirov za pretvorbo (Kdaj) in podrobno opisovanjem metodologij (Kako), ki jih podpirajo standardi in najboljše prakse. S tem okvirom si prizadeva razviti praktične rešit- ve za trajno ohranjanje digitalnih vsebin ob soočanju z obsolescenco formatov. Ključne besede: digitalni arhivi, formati datotek, migracija, pretvorba, zasta- relost BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 148 1. INTRODUCTION One of the issues that is beginning to become increasingly pressing in digital archives is the conversion of file formats, which affects both born and digitized digital records. Indeed, a digital archive may include text, still images, audio, video, digital records, or other digital media formats created natively in digital form, but it may also include digital records obtained by a digitization process that transforms records from an analogue format to a digital format. This typi- cally involves scanning physical records, such as paper records or photographs, acquiring audio recordings, or any other non-digital media to create digital copies that can be stored and accessed electronically. For example, scanning a paper record or converting a vinyl record into an MP3 file are instances of digitization. The primary aim is to preserve the content in a more durable and accessible dig- ital form without altering its original structure or function. In both cases, we are dealing with digital records encoded in a certain file format that will sooner or later become obsolete. More generally, all records, regardless of the category to which they belong (text documents, images, audio and audiovisual records, technical records, etc.) are encoded in the most diverse file formats, which are bound to become obsolete sooner or later (and in some cases already are). As developers identify and incor- porate new features, file formats evolve, and new formats (or new versions) are brought to market. Over time, old formats (or their old versions) become obsolete, and new generations of software phase out support for those formats, with the result that it becomes increasingly difficult to view or reproduce records encoded according to obsolete formats. The international archival community has identi- fied the solution to the problem of obsolescence of file formats in the conversion strategy (Digital Preservation Coalition, 2015; Archives of New Zealand, 2024), which involves converting a digital record from a format usable in each hardware and software environment to a format usable in another (usually temporally later) environment while retaining its significant properties. This means that content encoded in older formats is converted to new formats that can be used on more modern computers. This is a strategy that must be implemented early when changes in hardware and software begin to threaten the usability of digital content. Indeed, it is a critical BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 149 process: it does not allow for delays and must be executed as soon as new formats are defined and before the current format becomes obsolete. If one generation of new formats is lost, records may be difficult to recover; if multiple generations are lost, records may even be unrecoverable. Conversion cycles must be relatively frequent, since few digital documents today are able to survive for more than 10 to 15 years without any format reversal. However, it is necessary to keep in mind that each reversion corresponds to a loss of information (even if in some cases minimal) so it is essential to reduce the number of reversions and steer toward formats that are expected to be more “long-lived” over time. Although the problem of obsolescence of file formats appears less serious today than it might have appeared a few decades ago, thanks also to increased attention to this critical issue, it should not be underestimated. It is important to be able to identify precisely when file formats need to be converted, to choose the most suitable formats for conversion, and to use the correct methodologies from an operational point of view. Currently, many digital archives include digital records acquired in the 1990s and encoded in outdated formats that urgently need to be converted. Unfortunately, it must be acknowledged that despite being theorized as one of the most effective digital preservation strategies, format reversion is still not sufficiently practiced. Even when it is necessary to transfer digital ob- jects from an obsolete format to a more modern one, in many cases the process is delayed due to lack of knowledge and expertise, as well as lack of guidelines and technical-operational recommendations on how to carry out the conversion and what tools to use. This paper aims to address this issue by highlighting the rationale behind format conversion (why), when such conversion should be done (when), and the methodologies for implementing it (how), referring to both inter- national standards and guidelines. 2. SOME TERMINOLOGICAL ISSUES Before proceeding with the discussion, it is necessary to make a few terminolog- ical clarifications. Unfortunately, the terminology used in the scientific literature is not always homogeneous. For example, the ISO 13008 standard, both in its first version, published in 2012, and in the subsequent one, released in 2022, uses the terms ‘conversion’ and ‘migration’. The former (conversion) is defined as BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 150 «the process of changing records from one format to another» while the second (migration) is defined as «the process of moving records from one hardware or software configuration to another without changing the format». In essence, the first operation consists in converting a record from one file format to another; the second operation consists in transferring a record from one storage medium to another without altering its file format. Similar definitions are provided by the ISO 15489-1 standard according to which ‘conversion’ is the «process of changing records from one format to another»; ‘migration’ is the «process of moving records from one hardware or software configuration to another without changing the format». According to ISO 30300 ‘conversion’ is defined as «changing records from one format to another» while ‘migration’ consists of «moving records from one hardware or software config- uration to another». Of the same opinion is the dictionary produced within the InterPARES 2 project, which defines ‘conversion’ as «the process of transform- ing a digital document or other digital object from one format, or format version, to another one» and ‘migration’ as «the process of moving or transferring digital objects from one system to another». On the contrary, the Society of American Archivists (SAA) Dictionary of ar- chival terminology defines ‘format migration’ as «the practice of converting an electronic file to a different standard file type to circumvent obsolescence», i.e. the operation that the above-mentioned sources call ‘conversion’. A note to the entry specifies that «in format migration, the content is preserved but the bits are not. In past use, the term has been less precise and could have included media migration». The same term, ‘migration’, is also used in the case of transfer from one storage medium to another; in fact, the dictionary defines ‘media migration’ as «the practice of copying records from one physical carrier to another for pres- ervation» but also attributes substantially the same meaning to the term ‘conver- sion’, defined as the process «to move data to a different format, especially data from an obsolete format to a current format». The dictionary also provides an entry for the term ‘migration’ which is defined as «the process of moving data from one information system or storage medium to another to ensure continued access to the information as the system or medium becomes obsolete or degrades over time». BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 151 The ISO 14721 standard deals in depth with the subject of conversion, to which it dedicates the entire Chapter 5 ‘Preservation Perspectives’, in which it outlines the different motivations that may lead to undertake conversion operations and what the various types are. The terminology used is ‘digital migration’, defined as «the transfer of digital information, while intending to preserve it, within the OAIS». Even the ‘Digital Preservation Handbook’ (DPC, 2015) only uses the term ‘mi- gration’ and distinguishes between ‘format migration’ and ‘storage media mi- gration’; the former is defined as «transfer or transformation (i.e. migration) of data from an obsolete/old format to a new format, possibly using new application systems at each stage to interpret the information». To further clarify the mean- ing, an example is also given: «moving from one version of a format standard to a later standard is a version of this method; for example, moving from MS Word version 6 (from 1993) to MS Word for Windows 2010». In summary, some authors do not use the term ‘conversion’ but only the term ‘mi- gration’, applying it from time to time to the migration from one file format to an- other or from one storage medium to another. It is no coincidence that Fleischhau- er and Bradley wrote that in the field of digital preservation, the term migration is used in two ways. Media or system migration refers to moving digital files from obsolete data storage media or an obsolete data management system to new media or a new system. Media migration is sometimes called physical migration and media upgrading. In this form of migration, ‘the bits do not change’. In con- trast, format migration, also known as logical migration, refers to the movement of content from one format to another: ‘the bits change’. There are also authors who use the terms ‘migration’ and ‘conversion’ interchangeably, others who give them opposite meanings. All this, unfortunately, contributes to a certain level of confusion in a field that is already quite complex and would have needed much more clarity. Recently, the ‘Regulation (EU) 2024/1183 of the European Parliament and of the Council of 11 April 2024’ also referred in several places to the concepts of con- version and migration as operations necessary to ensure long-term preservation. In particular, it specifies that «a legal framework for qualified electronic archiv- ing services should be established, inspired by the framework of the other trust services set out in this Regulation. The legal framework for qualified electronic BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 152 archiving services should offer trust service providers and users an efficient tool- box that includes functional requirements for the electronic archiving service, as well as clear legal effects when a qualified electronic archiving service is used. Those provisions should apply to electronic data and electronic documents cre- ated in electronic form as well as paper documents that have been scanned and digitised. When required, those provisions should permit the preserved electronic data and electronic documents to be ported on different media or formats for the purpose of extending their durability and legibility beyond the technological va- lidity period, while preventing loss and alteration to the extent possible». Conversion and migration are also mentioned in Article 45j “Requirements for qualified electronic archiving services” where, among the requirements to be met by qualified electronic storage services, is listed that of ensuring that electron- ic data and documents «are preserved in such a way that they are safeguarded against loss and alteration, except for changes concerning their medium or elec- tronic format» In the following, we will use the terminology of the ISO 13008 standard and thus speak in general of ‘format conversion’ and ‘media migration’. In addition, we will speak for simplicity’s sake of “digital records” (often abbreviated to “re- cords” so as not to make the discussion more cumbersome) and “digital archives,” but it is understood that the concepts and thoughts that will be set forth can be applied, in general, to all types of digital objects, as much to those in a digital archive as to those in a digital library or any other repository. 3. WHY TO CONVERT FILE FORMATS When it comes to file format conversion, the first issue to consider is the reason why the conversion is necessary. According to the ISO 13008 standard, there are four reasons. - obsolescence: records are encoded in obsolete but still readable formats and therefore need to be converted to more modern formats. - ownership issues: records are encoded according to proprietary file formats and must therefore be converted to non-proprietary formats. - interoperability reasons: records must be converted to a file format that guaran- tees perfect interoperability with certain technological infrastructures. BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 153 - legal reasons: records must be converted according to explicit legal or regulato- ry requirements concerning file formats or service providers. When converting the file format of a record, the result may be one of the following: a) Replacing one format with another In this first scenario occurs when conversion is necessary to maintain access to records in the digital archive and ensure that they are fully available and usable over time. For example, this may be due to changes in the software tools used in the digital archive, abandonment of legacy formats at risk of obsolescence, or changes in the standard format used by the digital archive for online publication. To maintain access, file formats need to be converted not only because they age naturally and may become risky, but also for reasons related to technological changes. In other words, a file format may still be current but needs to be con- verted because the technological environment used to manage the digital archive has changed. If file formats were not converted with a proactive approach, over time one could find oneself unable to access records or use them in the desired manner, or in the need to use special and expensive software in order to regain access. However, when converting file formats, one must beware that the original files could potentially be deleted, which could pose risks in the long run. b) Creating an additional version in a different file format to meet usability re- quirements In this second scenario, rather than converting a record to a new format, addi- tional versions of that record are created in different formats to enable new forms of access and use, such as sharing or publishing information, using information in new ways, and aggregating information from various sources. This occurs, for example, when a document created using a word processing format (such as DOCX) needs to be converted to another format (such as PDF) to be published online. This does not imply that the original format is obsolete, but that it is necessary to have the same document in more than one format to fulfil certain requirements. A typical example is a digital archive containing a series of imag- es obtained because of a digitisation project. Usually, an image is produced in TIFF format at the highest possible quality for preservation purposes (‘master’ format) and from this a series of images is produced in JPG format, possibly with different resolutions and qualities, for use purposes (‘derived’ formats); images BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 154 may also be produced in GIF format to be used as thumbnails. However, for- mats should not be multiplied unnecessarily: if a single format satisfies all access needs, it is usually the best solution. 4. WHEN TO CONVERT FILE FORMATS Once it has been established that it is necessary to convert records from one file format to another, it is necessary to decide when it is appropriate to carry out this conversion. There are basically three methods, which depend largely on the motivations behind the conversion process (but may also depend on the technical environment or other requirements of the preservation system): a) on demand conversion; b) early conversion; c) late conversion. It is important to analyse them in detail. a) On demand conversion. With this strategy, conversion is dynamic in that it is carried out ‘on demand’, whenever a request to do so is received. It is generally performed on a single doc- ument at a time but is also applicable to mass (batch) conversions. This strategy can also be used to replace formats, but more often it is used to create additional versions of records in different formats, as required. The advantages are mani- fold: only one document needs to be archived and not the different versions of that document in the various formats required, thus minimising the need for stor- age space; it is not necessary to convert a large number of records at once, which could be burdensome for the system and time-consuming; adding new records to the system is simple, as it is not necessary to provide all the required formats in advance; the system can be updated to provide different formats as required, again without having to process all existing records in advance. However, there are also downsides: there is almost no way to guarantee the quality of the con- verted files; if this strategy is adopted, it is necessary to ensure that the conver- sion process is sufficiently reliable for one’s needs; on-demand conversion may be slow or overly burdensome for systems, depending on the size, complexity and number of conversions, forcing the user to wait too long. b) Early Conversion With this strategy, the conversion of records into different formats is performed as soon as possible (but not on demand). Early conversion is a strategy that is BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 155 performed not for preservation purposes but for management purposes. In other words, the objective is to convert a set of records into a format that best suits the needs of the digital archive, even if the previous format is not yet obsolete. For example, if the archive manager has decided to use a new format provided by updated software, it is possible to convert all records from the previous format to the new one. This strategy has many advantages. The number of different file formats to be supported is greatly reduced by converting records into a set of standardised formats. This can mean that information is always encoded in the current formats, thus reducing support, maintenance and software licence costs; the risk of document format obsolescence becomes negligible. The user has the opportunity to review the information and guarantee its quality. With frequent conversion, these processes are streamlined and each conversion benefits from previous experience. Of course, there are also disadvantages: the records are con- verted more frequently and, as the number of conversions increases, both the risk of information loss and the costs associated with the conversion operations increase in parallel ; if the new formats are quite recent, the conversion tools may not be as readily available, may have bugs or may not handle complex or unusual records well, and this too may affect the cost and quality of the conversion pro- cess; the new format may not be as widely adopted, so additional formats may need to be created to share records with users who have not yet upgraded, with the consequence that archiving all converted records will require more space than on-demand conversion. c) Late conversion With this strategy, it was decided to postpone the conversion until the last useful moment. Of course, the definition of ‘last useful moment’ varies greatly depend- ing on the risk/benefit assessment of the digital archive. This strategy has many advantages: records are converted less frequently, thus minimising the risk of information loss and reducing the costs associated with the conversion operation; if the target format is widely adopted, more conversion tools are likely to be available and these are likely to be able to handle unusual or complex records bet- ter, because there has been time to resolve bugs and special cases; some records may have exceeded their retention time and thus be destined for discarding, thus avoiding the need to convert them. There are also disadvantages: in the digital BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 156 archive there will be a greater variety of different formats in use and this may increase the support, maintenance and licensing costs of the software, reduce flexibility in the choice of different software and prevent older information from being usable in newer contexts; more file formats may need to be converted at the same time, which complicates the management and quality assurance of the conversion; if it is necessary for the same records to be accessible in several for- mats, storing all the records in the various formats will require more space than on-demand conversion; finally, the identification of the ‘last useful moment’ may be wrong and one may find that it is now too late and the conversion of some re- cords is no longer economically or technically feasible. Each of these different strategies (on-demand, early and late conversion) has ad- vantages and disadvantages, with different risks due to the timing of the conver- sion. There is no one-size-fits-all strategy and only by assessing the needs of the digital archive can the right balance between costs and benefits be determined and the most appropriate strategy implemented. 5. HOW TO CONVERT FILE FORMATS According to the ISO 13008 standard, the file format conversion process is done by following four basic steps: a) planning, b) testing, c) conversion, and d) validation (see Figure 1). In the following, the characteristics of each stage will be examined. Figure 1. The four key steps of a conversion process (a) Planning. This is the most important step, because the greatest likelihood of a successful conversion process comes from careful and thorough planning. First, the records to be converted need to be assessed to understand what char- acteristics and features need to be preserved in the conversion process. These re- quirements may not be immediately obvious, and you will need to work with the BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 157 digital archive manager and users to understand whether the records in the for- mats from which you are migrating have particular characteristics that you want to ensure remain unchanged, and to ensure that all their requirements are met. Some conversion processes only change the file format of the records, but others also change some of their properties. As a rule of thumb, records with a very simple structure (such as those in .txt format) can be expected to pass through the conversion process without significant change, but complex records will almost always undergo some form of modification. It is important to keep in mind that any conversion process potentially expos- es records to the risk of information loss. Therefore, prior to conversion, it is necessary to identify the “significant properties”, i.e. the file format’s properties that must be retained during the conversion process and that must ‘survive’ the conversion without (or with little) change. In this regard, it is important to men- tion that the research project that has been devoted more than any other to these aspects was InSPECT (Investigating the Significant Properties of Electronic Con- tent Over Time), funded in the United Kingdom by JISC between March 2007 and March 2009 under the “Repositories and Preservation” program2. The project aimed to establish a methodology for identifying “significant properties” of var- ious categories of digital records. By “significant properties,” the project initia- tors mean the characteristics of digital records that must be preserved over time, even following transformation operations. For example, some of these properties are the content of the records, the metadata that contextualizes their production and function, their appearance (e.g., layout, colors, etc.), the purpose for which they were produced, or even their logical structure3. It is important to make sure that the file formats to which records are to be converted are able to support ‘significant properties’ and that the conversion process is able to maintain them during conversion. If the new file format does not support the required features, the choice of that file format would need to be reevaluated. Some less obvious 2 The project was led by the Arts and Humanities Data Service (AHDS) Executive in collaboration with The Na- tional Archives. Later, after the AHDS was discontinued in March 2008, it was led by the Centre for e-Research (CeRch) at Kings College London, again in collaboration with The National Archives. Much of the material pro- duced under the project is still available today at . 3 These significant properties vary depending on the category of digital document: the project focused on four catego- ries of recorids 1) raster images; 2) e-mails; 3) text documents; and 4) sound documents. For example, in text formats it will be important to consider content, font, etc.; in image formats it will be important to consider resolution, color depth, etc.; in audio formats it will be important to consider sampling rate, sound depth, and so on. BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 158 features, usually related to complex or hidden features of the format, must also often be considered. The following is a non-exhaustive list of some of them (The National Archives of United Kingdom, 2011, 18–20). - embedded metadata. Many formats allow various descriptive metadata to be embedded. For example, some formats for textual documents embed metadata to identify the author, date of creation, date of last modification, etc.; some for- mats for photographs embed metadata indicating camera settings, geographic location at the time the shot was taken, etc4. This is often relevant information, and it is worth considering whether such embedded metadata also needs to be converted and whether conversion tools are capable of transferring it. - embedded objects. Many complex formats allow digital objects to be embedded in various formats. For example, text documents may contain embedded imag- es or spreadsheets; presentations may contain audio and video content. Not all conversion tools can handle all types of embedded objects. Therefore, checks must be made on documents with embedded objects to ensure the quality of the conversion process. - scripts and macros. Some formats may contain code written with programming languages. For example, text documents may include macros to automate com- mon tasks. In general, scripts and macros do not ‘survive’ conversion processes, unless they are conversions from one version of a format to another version of the same format5. Therefore, if support for scripts and macros is absolutely nec- essary, it may be necessary to manually rewrite them into the version intended by the new format. - digital signatures. Some categories of records allow digital signatures to be em- bedded within them (or to have digital signatures in external systems attached to those records). In format conversion, the binary sequence changes; therefore, after conversion the digital signature will lose its validity and strategies will need to be found to maintain the legal value of the document. A very important step in the planning phase is the choice of the tool for con- verting records from one format to another; this is a difficult choice, because 4 This is the case for EXIF metadata in the case of photographs in JPG format or ID3 metadata in the case of sound documents in MP3 format. 5 For example, from the DOCX format of Microsoft Word 2007 to the DOCX format of Microsoft Word 2013. BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 159 there are numerous ones, whether proprietary, freeware or open source. However, there is not the same level of ‘coverage’ for all file formats. For popular formats, such as images, several programs are available, but for niche or older formats the choices can be very limited. For formats with poor support, it may be necessary to perform two conversions, using an intermediate file format to bridge the gap between the format to be converted and the desired format6. In some cases, it may be necessary to commission ad hoc software to perform the conversion, especial- ly if the file formats are very specific. It is important to assess whether the soft- ware can fully convert significant document properties and related metadata, and not just whether it ‘simply’ converts from the source format to the target format. (b) Testing. Once the tools to be used have been identified and before starting the conversion process in its entirety, it is a good idea to carry out a testing phase on a representative sample of the records to verify that their significant properties and metadata are accurately converted and without loss of authenticity, reliabil- ity, integrity, and usability. The test requires accurate knowledge of the source and target file formats and hardware or software configuration. The test must ensure not only that an acceptable level of quality is achieved in converting the record, but also that the metadata is converted with the same level of quality; this requires the use of metadata extraction tools to compare the source and target metadata. It may also be necessary to use different metadata extraction tools for source and target formats and convert their results into a common form to facili- tate comparison. Finally, it would be desirable to identify metrics to automatically or semi-automatically measure the level of conversion quality and to be able to choose the most valuable software tool7. (c) Conversion. Once you have gained a sufficient understanding of the infor- mation and environment and selected formats and tools, you are ready to begin the document conversion operation. At this stage it is very important to set the parameters of the conversion software tool correctly (Bajcsy 2010). Some conver- 6 For example, in the case of converting old text documents created in the 1980s using the well known WordStar word processor, one could assume a first conversion from the WordStar format (.WS) to the format of the first versions of Microsoft Word (.DOC) and then a second conversion from the latter to the format introduced with Microsoft Word 2007 (.DOCX). 7 Projects that have addressed quality assurance include AQUA (Automating Quality Assurance), ; SPRUCE (Sustainable PReservation Using Community Engagement), ; SCAPE (Scalable Preservation Environments), . See also P. Wheatley, B. Middleton, J. Double, People Mashing: Agile Digital Preservation and the AQuA Project, . BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 160 sion tools operate only on the single document; if you want to convert multiple re- cords, you need to write a script to automate the processing of a batch of records or use tools that allow you to operate on entire batches of records. Today there are also many companies that perform conversion of file formats as a service, also offering it through an online interface and ensuring the confidentiality of records sent for conversion. (d) Validation. At the conclusion of the conversion process, an additional valida- tion process (beyond that of the initial test) is required to verify that significant properties and metadata have indeed remained unchanged or whether, on the contrary, losses have occurred. Obviously, manually checking each document can be impractical; therefore, the best solution is to perform a check on a repre- sentative sample of converted records, comparing the original document with the converted one. Depending on the type of document, this type of direct compari- son can be of various kinds. For example, while images can be visually analyzed, sound objects will require listening to the original and the transferred document to see if there are any differences. Of course, automatic or even artificial intelli- gence-based comparison tools can be used. Similarly, it is necessary to compare, using appropriate extraction tools, the metadata of the original and the converted document to verify that they too have been converted correctly. The end users of the information should be involved in this process, as they may detect subtle problems that non-users would not notice. Even if the original records are intended to be discarded after the preservation process, it is a good idea to keep them for a certain period of time to reduce the risk that significant properties may emerge over time that were not taken into account and thus were lost in the conversion process. This is true even if quality verification processes have shown that the conversion is fully successful. It is not possible to determine a priori how long originals should be retained, as this will depend on the importance of the records, the rationale behind the conver- sion, the organization’s risk tolerance, its confidence in the conversion process, the cost of retaining the originals and maintaining the relationships between them and the converted records, and the balance with the need to reduce the amount of records stored. BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 161 6. FINAL CONSIDERATIONS File format conversion has been theorized as one of the most effective preserva- tion strategies, although we are not currently aware of any digital preservation systems that have already implemented the necessary functionality to initiate conversion operations, at least in the Italian context. In fact, as already high- lighted in the Introduction, even when there is a need to convert records from an obsolete format to a more modern one, the operation is often delayed due to a lack of knowledge and expertise, as well to the lack of clear and precise guidelines and recommendations. Unfortunately, the few guidelines and recommendations available to date do not provide concrete and operational guidance on the con- version process (e.g., what software tools to use, what metrics to use to measure information loss and test the accuracy and quality of conversions, etc.). Also of concern is the lack of conversion software tools that have been “certified” by an independent authority or certifying body and that give reliable guarantees on the results that can be obtained and to whom conversion processes can be entrusted. Finally, there are issues of no small concern that need further consideration, such as the loss of “informational” content that could occur because of the repeated conversions that will be necessary over time, or the anticipated need to preserve all versions of documents undergoing conversion that will be produced over time in the various and successive formats. However, since there are already many documents in digital archives that are in obsolete formats and at risk of no longer being readable, it is time to start the first conversion operations. Highly specialized professionals are needed to govern these processes, and it would be desirable to initiate training courses that can provide the necessary knowledge and skills. In addition, it would be desirable for the digital preser- vation community to begin to focus on these issues and initiate a collaborative effort to develop guidelines and recommendations. REFERENCES Automating Quality Assurance Project (AQUA). (s. d.). Retrieved at https://wiki. opf-labs.org/login.action?os_destination=%2Fdisplay%2FAQuA%2FHome (accessed on September 20, 2024). BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 162 Archives of New Zealand. (2024). File format migration. Retrieved at https:// www.archives.govt.nz/ manage-information/how-to-manage-your-informa- tion/digital/file-format-migration (accessed on September 20, 2024). Bajcsy, P., Kooper, B., Marini, L., McHenry, K. And Onderjcek, M. (2010). A Framework for Understanding File Format Conversions. In Proceedings of the 2010 Roadmap for Digital Preservation Interoperability Framework Workshop, (Article no. 10, pgs. 1–7). New York, NY, USA: Association for Computing Machinery. Retrieved at DOI: 10.1145/2039274.2039284 (ac- cessed on September 20, 2024). Digital Preservation Coalition (DPC). (2015). Digital Preservation Handbook, 2nd Edition. Retrieved at https://www.dpconline.org/handbook (accessed on September 20, 2024). Fleischhauer, C., and K. Bradley (eds.). (2019). Guidelines for the Preservation of Video Recordings, IASA-TC 06, London, UK: International Association of Sound and Audiovisual Archives. Retrieved at https://www.iasa-web.org/ sites/default/files/publications/IASA-TC_06-A_v2019.pdf (accessed on Sep- tember 20, 2024). Investigating the Significant Properties of Electronic Content Over Time project (InSPECT). (2010). Retrieved at https://significantproperties.kdl.kcl.ac.uk (accessed on September 20, 2024). International Research on Permanent Authentic Records in Electronic Systems Project (InterPARES). (8. 11. 2024) The InterPARES 2 Project Dictionary. Re- trieved at http://www.interpares.org/ip2/display_file.cfm?doc=ip2_dictionary. pdf (accessed on September 20, 2024). ISO 13008:2022 – Information and documentation — Digital records conversion and migration process. ISO 14721:2012 – Space data and information transfer systems – Open archival information system (OAIS) – Reference model. ISO 15489-1:2016 – Information and documentation — Records management. Part 1: Concepts and principles. ISO 30300:2022 – Information and documentation – Records management – Core concepts and vocabulary. BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 163 Official Journal of the European Union. (2024). Regulation (EU) 2024/1183 of the European Parliament and of the Council of 11 April 2024 amending Regula- tion (EU) No 910/2014 as regards establishing the European Digital Identity Framework. Society of American Archivists (SAA). (2024). Dictionary of Archives Terminology. Retrieved at https://dictionary.archivists.org (accessed on September 20, 2024). Sustainable Preservation Using Community Engagement Project (SPRUCE). (s. d.). retrieved at https://wiki.opf-labs.org/display/SPR/Home (accessed on September 20, 2024). Scalable Preservation Environments Project (SCAPE). (s. d.). Retrieved at https:// scape-project.eu (accessed on September 20, 2024). The National Archives of United Kingdom. (2011). File Format Conversion. Re- trieved at https://cdn.nationalarchives.gov.uk/documents/information-man- agement/format-conversion.pdf (accessed on September 20, 2024). Wheatley, P., Middleton, B., Double, J., Jackson, A. N. and McGuinness, R. (2011). People Mashing: Agile Digital Preservation and the AQuA Project. iPRES 2011 - 8th International Conference on Preservation of Digital Ob- jects Nov. 1–4, 2011, Singapore. Retrieved at https://services.phaidra.univie. ac.at/api/ object/o:294255/download (accessed on September 20, 2024). Summary The obsolescence of file formats is a pressing issue in the long-term preserva- tion of digital archives. As technology evolves, many formats become outdat- ed, necessitating conversion to newer ones to ensure continued accessibility. This issue affects both born-digital and digitized records, often exacerbated by vendors’ practices of planned obsolescence. This study proposes a systematic approach to address file format obsolescence through file format conversion, guided by international standards and best practices. Despite its importance, the practical application of format conversion remains limited, hindered by gaps in expertise and operational guidelines. This paper addresses these gaps by discussing the necessity (Why), timing (When), and methodologies (How) for format conversion, aiming to provide a practical framework for advancing digital preservation efforts. BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA 164 The paper examines the challenges posed by file format obsolescence in preserv- ing digital archives, highlighting the need for systematic file format conversion. It identifies barriers to implementation, including a lack of expertise and clear guidelines, and discusses ways to overcome them through international stand- ards. The study aims to provide a practical framework covering the motivations, timing, and methodologies for format conversion to improve digital preservation practices in the face of advancing technology and planned obsolescence. Typology: 1.02 Review Article BORN DIGITAL AND DIGITIZED ARCHIVES: ADDRESSING THE ISSUE OF FILE FORMAT OBSOLESCENCE STEFANO ALLEGREZZA