DELIVERABLE D11 Project Acronym: EODOPEN Grant Agreement Number: 607666-CREA-1-2019-1-AT-CULT- COOP2 Project Title: EODOPEN | eBooks-On-Demand-Network Opening Publications for European Netizens Project website: https://eodopen.eu/ Guidelines and recommendations for the provision of alternative and special formats based on the survey on special needs of users and technical requirements Author(s): Alenka Kavčič – Čolić Tina Glavič Andreja Hari Constantin Lehenmeier Piotr Kožurno Project Co-funded by the Creative Europe Programme of the European Union Dissemination Level P Public X C Confidential, only for members of the project partners and the Commission Services 1 DOCUMENT INFORMATION Activity number: A11 Activity title: Developing guidelines and recommendations for delivery formats Contractual date of activity: September 1st 2019 – March 31st 2022 (M1-M29) Actual date of activity: March 2020 – January 2022 Author(s): Alenka Kavčič – Čolić, Tina Glavič, Andreja Hari, National and University Library (NUK), Constantin Lehenmeier University Library Regensburg (UREG), and Piotr Kožurno Nicolaus Copernicus University in Toruń (NCU) Contributor(s): Participant(s): EODOPEN-project members Working group: WG4 Working group title: Delivery formats of digitised material for special needs Working group leader: Alenka Kavčič – Čolić Dissemination Level: P 2 HISTORY OF VERSIONS Version Date Status Author (organisation) Description/Approval Level Alenka Kavčič – Čolić (NUK) First draft, checked by WG4- 1 23/4/ 2021 Draft Tina Glavič (NUK) members. Andreja Hari (NUK) Constantin Lehenmeier Editorial and new information 2 18/05/ 2021 Draft (UREG) added Alenka Kavčič – Čolić New information added, (NUK) layout edited, some parts 3 05/11/ 2021 Draft Tina Glavič (NUK) removed and partners’ Andreja Hari (NUK) examples included. Constantin Lehenmeier Editorial, additional 4 16/11/2021 Draft (UREG) proofreading. Piotr Kožurno (NCU) Alenka Kavčič – Čolić (NUK) 5 25/01/2022 Final Final Tina Glavič (NUK) Andreja Hari (NUK) 21/1/2022- 6 Peer review group Document peer review. 3/3/2022 Alenka Kavčič – Čolić Corrections of the reviewed For (NUK) 7 21/3/2022 version, NUK final publ. Tina Glavič (NUK) proofreading Andreja Hari (NUK) 3 EODOPEN PROJECT SUMMARY Libraries all over Europe face the difficult challenge of managing 20th and 21st century textual materials which have not yet been digitised because of the complex copyright situation. These works cannot be accessed by the general public and are slumbering deep in library stacks, as they are often out-of-print or have never even been in-print at all, and reprints or facsimiles are out of sight. The EODOPEN project focuses on making 20th and 21st century library collections digitally visible by directly engaging with communities in the selection, digitisation and dissemination processes. As leading partner, the University Library of Innsbruck, joined by 14 European libraries from 11 nations, has set itself the goal to make 15 000 textual materials digitally available and to reach more than 1 million people in Europe by 2024. Among other goals such as building a common portal to display the project outcomes, EODOPEN aims to stimulate interest in and improve access to 20th and 21st century textual material, including grey and scientific literature. EODOPEN continuously carries out social media campaigns in order to attract new audiences. Furthermore, libraries establish contacts with commemorative institutions all over Europe as well as with researchers and doctoral study boards, history associations and local publishing houses to ask broad audiences for their suggestions. In collaboration with local institutions all project partners select hidden library treasures, deal with rights clearance questions and put new content online. Dissemination activities display the digital content via international channels. In addition, EODOPEN aims to provide alternative delivery formats, especially adequate for blind or visually impaired users. An international survey asks a broad European public about the use of e-books. Evaluating the survey’s outcome, the project broadens the scope to alternative delivery formats in order to fulfil the needs of blind or visually impaired users. To promote best practice in rights clearance among the library community, EODOPEN provides handouts and tools to make 20th and 21st century books available beyond the project’s lifetime. In this sense, project partners closely cooperate to develop an online tool for the documentation of rights clearance, especially suited for out-of-print and orphan works. Interactive workshops enquire about the needs when dealing with rights clearance questions in order to set up the tool by implementing the requirements of an international community. 4 ABSTRACT The Guidelines and Recommendations for the Provision of Alternative and Special Formats (hereinafter Guidelines) were prepared as one of the deliverables of the EODOPEN project’s working group 4 on Delivery formats of digitised material for special needs. They are based on two surveys, the first one was carried among participating EODOPEN project partners, and the second one was undertaken among users of the EODOPEN partner libraries. The Guidelines have been developed because most of the delivery formats, as a result of digitisation, are hardly accessible to users of mobile devices, and inaccessible to blind and partially sighted users. The Guidelines are aimed at organizations in the field of culture that digitise their collections for broader audiences, and are concerned about reaching mobile devices users, print-disabled, blind and partially sighted communities. The Guidelines are based on the EODOPEN partners’ experiences. The Guidelines are divided in three parts. The first one consists of the introduction to the Guidelines, its scope and how to implement them, explanation of key concepts and background with the description of the project’s surveys, which were the basis for the Guidelines. In the second part, the Guidelines are explained, and in the third a list of relevant documents and a vocabulary of some of terms used and a list of used acronyms are presented. The Guidelines follow the digitisation workflows in the EODOPEN partners’ organizations and describe the best practices in each of the processing phases undertaken. Special attention is given to full-text generation and possible problems with OCR implementation. A list of recommended delivery formats is given. In a special chapter, the additional phases of text processing for blind and partially sighted users are described, providing recommendations and suggestions for the creation of accessible documents. Statement of originality: This report contains original unpublished work except where clearly indicated otherwise. Acknowledgement of previously published material and of the work of others has been made through appropriate citation, quotation or both. 5 Table of Contents DOCUMENT INFORMATION 2 HISTORY OF VERSIONS 3 EODOPEN PROJECT SUMMARY 4 ABSTRACT 5 1 Introduction 8 1.1 Purpose of the guidelines 8 1.2 What is new in them that cannot be found in other guidelines? 8 1.3 How to use these Guidelines 9 1.4 Key concepts explanation 9 2 Background 10 2.1 Brief presentation of the project EODOPEN 10 2.2 Overview of end-users, mobile devices, and delivery formats 10 2.3 A brief overview of the surveys’ results on special needs of users and technical requirements for digitisation 13 2.3.1 Survey on eBook users 13 2.3.2 Survey on blind and partially sighted eBook users 14 2.4 The current situation regarding digitisation workflows and access in EODOPEN partners' organizations 15 2.5 Different experience regarding alternative and special formats 15 3 Guidelines and recommendations for the provision of alternative and special formats 17 3.1 Digitisation planning 17 3.2 Different phases of processing 18 3.2.1 Image capturing / scanning 19 3.2.2 Image processing and full-text generation 20 3.2.3 Post-Correction 24 3.2.4 Conversion to different delivery formats 24 3.3 Recommended delivery file formats 24 3.4 Additional phases of implementation for blind and partially sighted users 28 3.4.1 Creating accessible document 28 3.4.1.1 OCR check 29 3.4.1.2 Design of the document 29 6 3.4.1.3 Structure of the document and graphical elements 29 3.4.1.4 Math and science or special symbols 30 3.4.1.5 Accessibility check 32 3.4.1.6 Exporting to different formats from Microsoft Word 32 3.4.2 Creating audiobooks (mp3, DAISY) 33 3.5 Using open source software for file format conversion 37 3.5.1 Balabolka 38 3.5.2 Calibre 39 3.5.3 Robobraille 40 3.6 Examples of implementation 42 4 References 44 5 Key relevant documents, guidelines and sources 44 5.1 Digitisation guidelines 44 5.2 OCR implementation guidelines 45 5.3 Other technical guidelines 45 5.4 Guidelines for creating accessible documents 45 5.5 Guidelines for creating audiobooks 46 5.6 Further useful sources 47 6 Vocabulary 48 7 Used acronyms 50 7 1 Introduction In the last decades, digitisation has become an essential part of information societies. In libraries, reprography functions have been replaced by digitisation, thus enabling immediate access to information, overcoming distances and technology. One of the problems concerning digitisation that we are facing today is that the dominating delivery format of digitised copies is in PDF. This format is not user-friendly for being accessed through mobile or handheld devices. Blind and partially sighted people have a special problem in accessing these contents as their reading experiences are based on other delivery formats, or a content supported by assistive technologies. Besides blind and partially sighted, there are also other print-disabled people1 who would benefit from a more user-friendly access to digitised materials. We believe that with additional efforts, digitised content could be available to everyone, regardless of technology used, or difficulties that blind and partially sighted readers/users are faced with. 1.1 Purpose of the guidelines The aim of the Guidelines and recommendations for the provision of alternative and special formats (hereinafter Guidelines) is to help librarians and other organizations in the field of culture to make digitised content available to broader communities. They are developed upon the EODOPEN partners’ experiences in dealing with users of mobile and handheld devices, and with blind and partially sighted users. They gather experiences from all EODOPEN consortia partners and are presented as guides in a systematic way. 1.2 What is new in them that cannot be found in other guidelines? Most of the digitisation guidelines are aimed at sighted readers, using normal-sized screens. These Guidelines include additional library users’ communities. Unlike most of the guidelines dealing with digitisation or digital preservation, they are focused on optimal delivery file formats. Thus, the Guidelines present the existing digitisation workflows in most of the EODOPEN partner’s institutions, recommendations on optimal delivery formats, descriptions of the procedures for the creation of EPUB publications, and compatibility among mobile devices, file formats, and applications. The EODOPEN Guidelines are conceived through a general approach to avoid their obsolescence due to the rapid information technology developments, especially concerning mobile devices and applications. They are related to the EODOPEN technical reports on delivery formats: • D12a - Technical report on the implementation of special formats and conversion services, • D12b - Report on trial implementations for mobile devices and 1 “The term ‘print disabled’ was coined by George Kerscher, Ph.D. around 1989 to describe persons who could not access print. He used it to refer to: A person who cannot effectively read print because of a visual, physical, perceptual, developmental, cognitive, or learning disability.” (My Blind Spot. (s.a.). MBS accessibility defined. Available on 4 March 2022 at https://myblindspot.org/mbs-accessibility-defined/) 8 • D12c - Report on trial implementations for print disabled users. Additional workshop and training materials preparation are planned under activity A13 Compiling training material for delivery formats and training library staff. 1.3 How to use these Guidelines The Guidelines are structured in three parts. The first part consists of the introduction to the Guidelines, their scope, and how to implement them (Chapter 1). In the second chapter, the context and background of their development are described: the EODOPEN project is briefly introduced, followed by short summaries of the results of three surveys dealing with special needs for delivery formats during digitisation: the first two refer to users of mobile devices and blind and partially sighted users, and the third to the current situation in partner organisations regarding digitisation and delivery formats. The same section also covers the literature overview. The second part (Chapter 3) consists of the Guidelines and recommendations for the provision of alternative and special formats which are divided into six parts with recommendations on how to plan a digitisation project, which phases should be implemented, what are the special requirements for blind and partially sighted users that should be taken in account; the chapter also includes some examples of implementation. In the third part, key relevant documents are listed that could be of use as additional guidance (Chapter 5), and the part also includes a vocabulary for easier understanding of specific terms used in the Guidelines (Chapter 6). 1.4 Key concepts explanation In the Guidelines, mobile devices are defined as smartphones, notebooks, and tablet computers as well as e-readers. In this text, the term blind and partially sighted users is used according to the European Blind Union (EBU) instead of the term blind and visually impaired users. Digitisation means digital conversion of information on analogue carriers. Target communities are people that access digitised content in libraries and other organizations in the field of culture. Usually, the term eBook refers to digitally born publications. However, we use the term eBook referring to digital publications produced as a result of digital conversion, including formats for special needs (including audiobooks), which is also the objective of the EODOPEN project. However, this term does not exclude digitally born publications since the delivery format is the same or has the same purpose or functions. eBooks could be accessible through e-readers or could simply be read on personal computers (PCs) or mobile devices like smartphones, tablets or notebooks. 9 2 Background 2.1 Brief presentation of the project EODOPEN The project EODOPEN (eBooks-On-Demand-network Opening Publications for European Netizens) (2019-2023) is the follow-up of the project eBooks on Demand (EOD) (2009-2014). In both projects, networks of libraries have cooperated in the digitisation of library materials, making the digitised collections available to a broader public. In the EOD project, a common service was developed in which more than 40 European libraries are cooperating. Digitisation of this project encompasses library materials in the public domain produced till the 19th century. On the other hand, 15 libraries from 11 European countries2 cooperated in the EODOPEN project - in the digitisation of the 20th and 21st century textual materials which have not yet been digitised and are still of public interest. These library materials are copyrighted or out-of-print or have never even been in print at all and reprints or facsimiles are out of sight. The EODOPEN project focuses on making them available to the broad public, respecting current copyright rules. In addition, with alternative delivery formats, in particular, for mobile devices, as well as for blind or partially sighted users, these digitised contents will reach a broader audience. 2.2 Overview of end-users, mobile devices, and delivery formats Mobile devices are inseparable tools of the global information society. According to Eurostat3 94% of young people and 77% of the adult population in the EU-27 made daily use of the internet in 2019. The most common mobile devices for internet connections were mobile or smartphones, laptops, and tablet computers. This share is still drastically rising. The same source says that in “the EU-27 there were, on average, 1,220 mobile phone subscriptions per 1,000 inhabitants in 2018; in other words, there was an average of 1.2 mobile subscriptions per person. Since the late 1980s and early 1990s, the number of subscriptions has increased rapidly as mobile phones, and later smartphones, have become commonplace.”4 Smartphone technology is already part of our everyday life. Through different networks and social media, it enables a constant interconnection with other tools and people. 2 EODOPEN partners are (by country alphabetical order): University of Innsbruck (Austria) (coordinator), Czech Academy of Sciences Library (Czech Republic), Moravian Library (Czech Republic), Research Library Olomouc (Czech Republic), National Library of Estonia (Estonia), University of Tartu (Estonia), University of Greifswald (Germany), University of Regensburg (Germany), National Széchényi Library (Hungary), University of Vilnius (Lithuania), Nicolaus Copernicus University in Torun (Poland), National Library of Portugal (Portugal), Slovak Centre of Scientific and Technical Information (Slovakia), National and University Library, Ljubljana (Slovenia), and National Library of Sweden (Sweden). 3 Being young today in Europe – digital world. EUROSTAT. Available on 4 March 2022 at https://ec.europa.eu/eurostat/statistics-explained/index.php?title=Being_young_in_Europe_today_- _digital_world 4 European Neighbourhood Policy – East – statistics on science, technology and digital society. EUROSTAT. Available on 4 March 2022 at https://ec.europa.eu/eurostat/statistics- explained/index.php?title=European_Neighbourhood_Policy_-_East_- _statistics_on_science,_technology_and_digital_society 10 Consequently, the service economy is trying to reach their customers through such devices, adjusting their contents and visibility. The changes have an impact also on European regulations, for instance, the adopted EU Directive 2016/2102 of the European Parliament and of the Council of 26 October 2016 on the accessibility of the websites and mobile applications of public sector bodies. In 2019, the second EU Directive 2019/882 of the European Parliament and of the Council of 17 April 2019 on the accessibility requirements for products and services was adopted; it also refers to eBooks. This directive targets mainly publishers to create born-digital accessible eBooks, thus it is evident that accessibility of eBooks to various special needs is also being closely considered. The directive doesn’t include digitised content but organisations, such as libraries or other cultural institutions that should apply necessary changes into their performance. The same directive stimulated many initiatives for easier creation of accessible publications, such example is the open-source tool WordToEPUB5 which is being developed by The DAISY Consortium6; it enables easy EPUB creations from Microsoft Word. The World Health Organisation (hereinafter referred as WHO) reports in its World Report on Vision (2019) that “Eye conditions are remarkably common. Those who live long enough will experience at least one eye condition during their lifetime. Globally, at least 2.2 billion people have a vision impairment or blindness, of whom at least 1 billion have a vision impairment that could have been prevented or has yet to be addressed.” In addition to that, considering the aging of the population, it is expected that the numbers will only increase due to illnesses which occur in later ages, such as diabetes. The European Blind Union provides the following statistics for Europe: • “There are estimated to be over 30 million blind and partially sighted persons in geographical Europe. • An average of 1 in 30 Europeans experience sight loss. • There are four times as many partially sighted persons as blind persons. • One in three senior citizens over 65 faces sight loss. 90 percent of visually impaired persons are over the age of 65.”7 For blind and partially sighted people and also for other users it is important to consider how much a person can use his/her sight, how one’s sight can vary from day to day, light conditions, tiredness, stress etc. That is why it is important to consider enabling the users to adapt the visual presentation of the text to fit their needs. Some of the most common challenges that mostly partially sighted people are facing are: problems of focusing on the text, reduced contrast sensitivity, reduced field of vision, sensitivity to movement, visual fatigue and similar. For them, the most useful adjustments are: changing font size, font type, colour themes, margins and spacing adjustments. Of most importance is also the option to 5 More information about WordToEPUB and guidance: https://daisy.org/activities/software/wordtoepub/ 6 DAISY Consortium: https://daisy.org 7 European Blind Union. (n. d.). About blindness and partial sight. Available on 4 March 2022 at http://www.euroblind.org/about-blindness-and-partial-sight/facts-and-figures#details 11 access the full text (where preferably OCR is checked), and to enable the use of the assistive technologies. The advantages of mobile devices are their availability (low price) and easy handling. Today, it can be estimated that every library user has a smartphone. We can assume that smartphones will be used in the future as well. According to some forecast statistics8: • The use of smartphones: 2.5 billion in 2016 and will increase to 3.8 billion in 2021. • By the end of 2020, 46.5% of the world’s population owned a smartphone. • In 2019, smartphone shipments amounted to around 1.37 billion units. • In 2019, tablet shipments amounted to around 48.6 million units (EU: 55.7% Apple, 22.54% Samsung). • The popularity of e-readers declines (the phone is the new e-reader). The development of mobile devices has a strong impact also on the development of their operating systems and tools. That is why the service sector, including libraries and organizations from the field of culture, should be aware of it. In that perspective, it is very important to plan and produce an optimal number of file formats that would be supported by the referred to devices. FORMAT PDF, E PUB, … PLATFORM DEVICE webpage, smartphone, reading app, notebook, … … Figure 1: Mobile devices' important features eBooks can be read on different kinds of mobile devices, for instance on e-readers (Kindle, Kobo, Midia Inkbook, NOOK, etc.), smartphones, tablets and portable computers (notebooks). The selection of file format delivery and/or access depends on the type of device (size of screen, visual presentation) and the existing platform (mostly used are Microsoft, Android, and iOS) (Figure 1). There are no problems in accessing PDF files through devices with bigger screens. Since PDF is not a responsive file format, it is not recommendable for smaller devices like smartphones or e-readers. Some platforms support certain types of formats. For instance, Kindle e-readers support AZW/MOBI formats and not the EPUB format. DAISY reader application can be uploaded on Android. As regards blind and partially sighted users, there 8 Statista: https://www.statista.com 12 are platforms that also support speech synthesis or have functionalities for additional adjustments of the screen and a mode of access. For this group of users, a continuous access to information is particularly important. They need adequate file formats and assistive technologies for reading (for instance braille display or screen readers) as well as platforms (webpages, digital libraries, applications) created with accessibility in mind and therefore compatible with assistive technologies. Tablets and e-readers are best suited devices for reading eBooks, requiring alternative formats. In addition, the use of smartphones is particularly popular among young users. Blind and partially sighted users require other special formats, for instance in the form of Digital Talking Books. To meet these demands, up-to-date delivery formats (e.g. EPUB) and formats for special needs (e.g. DAISY, mp3) are necessary. 2.3 A brief overview of the surveys’ results on special needs of users and technical requirements for digitisation From 20 April to 30 June 2020, the EODOPEN partners conducted two additional surveys among users of all fifteen EODOPEN project partners. The first survey aimed to get an insight into user experience (eBook reading) of eBook readers, while the second one was done to reach and understand the needs of blind and partially sighted eBook users. Besides identifying users’ experience of reading eBooks, the objective of the surveys was to get an overview of positive and negative characteristics of eBook features, usage on different mobile devices, and eBook file formats. The second survey was focused on the special needs of blind and partially sighted eBook users. 2.3.1 Survey on eBook users The findings of the survey of special needs related to mobile devices are based on collected data of 1,718 respondents, who answered at least 80% of questions. Most of the respondents were aged from 20 to 29, so it is not surprising that the majority of respondents represent the student population. They are followed by library professionals and academics. According to the findings, approximately 50% of respondents access eBooks in a library, borrow them, or access them online through the digital library. About 40% usually purchase eBooks through commercial services. Most of the respondents (86%) prefer to download eBooks and read them offline. 57% of them usually read an eBook from its beginning to the end, and the reading time of 60% of respondents is longer than 30 minutes. Respondents marked downloading, full-text search, browsing, zooming, and pagination as either important or very important. As reported by respondents, most common problems referring to eBooks are full-text restrictions, pagination, ambiguous navigation system, and lack of responsive design. Approximately 67% of respondents consider a text format to be the most important, and less than 20% of them use software for converting the existing format to a more suitable one. 13 The most preferred file format for eBooks is PDF (45.76%), followed by EPUB (38.36%). HTML, TXT, and RTF were chosen by 1%-4% of respondents only. The most used device for accessing and reading eBooks is a notebook and a smartphone. For eBook reading, mobile device users prefer e-reader (49.19%), followed by tablet (24.16%) and notebook (20.02%). Respondents consider smartphones as being less appropriate mobile device for reading eBooks. (5.88%). According to the findings of the survey conducted among eBook users, a considerable number of them are accessing eBooks. Not all file formats are suitable for mobile devices. That is why 67% of users pay attention to the format and 20% of them convert it to other more suitable formats. There are also some functionalities preferred by the users, like downloading, full-text search, browsing, zooming, and pagination. Interestingly, most users acknowledged the PDF to be the preferred file format for eBooks, followed by EPUB. 2.3.2 Survey on blind and partially sighted eBook users Collected data of 525 respondents, who answered at least 50% of questions or more, were analysed. Most of the respondents were more than 40 years old, and the majority of them represented retired population. Respondents were divided into three categories, equal in numbers, according to the vision loss: blind, almost completely blind, and partially sighted. We believe that it is important to consider needs of blind and partially sighted, and also the needs of elder population in the context of digital literacy and mobile device usage. Respondents prefer to borrow/download eBooks in the library (58.1%) and to access them through the digital library (43.24%). 84.76 % of respondents prefer to download eBooks and read them offline, while 26,86% read them online. Most read from the start to the end (73.72%), usually the reading time is longer than 30 minutes (75.24%). 53.72% of respondents search for the table of contents, abstract (20,95%) and references (16.76%). The most important are downloading functionality (83.04%) and full-text search (62.09%). The main problems that they had to cope with were: specialized software for reading (52.38%), eBook navigation system (43.05%), and full-text restriction (34.48%). 54.29% of respondents pay attention to text format. The most preferred file formats for eBooks are PDF, TXT and EPUB. We assume that their use depends on remaining sight and assistive technologies. A third of respondents use software to convert the existing format. The most often used software and applications are ABBYY FineReader, Balabolka, Calibre, Voice Dream Reader, MS Word, Robobraille and Adobe Acrobat. Most often they convert to TXT format, followed by variations of MS Word. The most preferred devices are notebooks and smartphones. We assume that e-readers are not thoroughly compatible with assistive technologies. The most often used assistive technologies are screen reader software in the native language (62.29%) and screen reader software in other languages (23.62%). Screen reading software in native language is mostly 14 used on all mobile devices while a braille display with a keyboard is most often used on notebooks. The survey on blind and partially sighted eBook users shows differences between the use of eBook functions/features, file format, devices and assistive technologies, can vary, depending on how much a person can use his/her sight. That is why it is important to consider completely blind persons who wholly rely on assistive technologies, and others who would like to use their remaining sight with a support of various adjustments. 2.4 The current situation regarding digitisation workflows and access in EODOPEN partners' organizations In 2020, the National and University Library (Slovenia) conducted a survey among EODOPEN partners about digitisation practices for blind and partially sighted users, and their using of mobile devices. As regards their digitisation practices, they reported that they used to apply an average scanning resolution of 300-400 dpi and optical character recognition (OCR) by ABBYY FineReader, thus achieving 97%-99% of ground-truth. In addition, some of the partners use Transkribus and Tesseract for OCR. The delivery format is in most partners PDF/A, in some cases MS Word combined with mp3 (University of Regensburg), MS Word (Slovak Centre of Scientific and Technical Information, University of Regensburg, National Library of Estonia, University of Torun, Poland, National Széchényi Library) and other image file formats like JPEG and TIFF (Research Library Olomouc, National Library of Estonia), depending on the digitised contents. Only in Slovak Centre of Scientific and Technical Information, Vilnus University, and National Széchényi Library, EPUB as a delivery format is used. These libraries cooperate with blind and partially sighted organizations and serve their users. That is why they have acquired a certain level of experience among the EODOPEN project partners. One of the surveyed aspects was also the users’ access to digital content. Most of them accessed delivery formats in EODOPEN libraries other than PDF and EPUB, like XML and HTML. Students most commonly use notebooks, some users prefer smartphones and other devices and very few use tablets. Windows is the most often used operating system, followed by Android and iOS. 2.5 Different experience regarding alternative and special formats Available guidelines and recommendations on digitisation processes have been made by public or private institutions which primarily deal with digitisation, and associated organisations on an international and national level, such as libraries, museums, or archives. They stem from either project cooperation or have been prepared by digitisation equipment and soft/hardware providers. Guidelines and recommendations mainly focus on the process of digitisation per se and the results of digitalisation for various types of material, including different technical characteristics. Long-term preservation formats and sustainability factors prevail in guidelines and recommendations, whereas file formats for the end-users are not exposed. As a part of the 15 Succeed project9 Recommendations for metadata and data formats for online availability and long-term preservation have been prepared. As stated in the mentioned recommendations (2014) “delivery files for the end-user should be undemanding to use and simple to display. It is also advisable to consider using several delivery formats for specific digital objects, as different users can have contrasting preferences.”. 9 Project Succeed: https://www.succeed-project.eu/home 16 3 Guidelines and recommendations for the provision of alternative and special formats The results of the surveys described in previous chapters enabled to assemble the information on the EODOPEN partners’ practices and their users’ needs regarding the access to eBooks. Although partner organizations have different digitisation workflows, there are common ones that are important for achieving good results. We also compared the digitisation outputs and access formats, trying to reconstruct an optimal workflow and access file production for the eBook readers. These Guidelines are based on the results of the survey and good practices and standards overview. They follow different phases of the digitisation workflow which ends with the creation of delivery formats. These are: 1. Digitisation planning 2. Different phases of processing: a. Image capturing / scanning b. Image processing c. Full-text generation d. Conversion to different formats 3. Recommended delivery file formats 4. Additional phases of implementation for blind and partially sighted users a. Creating accessible documents b. Creating audiobooks (mp3, DAISY) 5. Using open source software for file format conversion 6. Examples of implementation (workflows, scenarios, cases) For each of the listed phases good practices recommendations could be found. 3.1 Digitisation planning This is the most important phase when starting a digitisation project. We need to clarify why we digitise, which is the target group we would like to reach, what kind of material we would like to digitise, how we will do it, with what resources and procedures. Usually, the reasons for digitisation are to build a digital collection or digital library, to increase access to these collections, to respond to high users’ demands and reduce the burden on staff, to protect original library materials, to offer additional functionalities to users or to promote or do marketing, raising funds, staff training, etc. With regard to different target groups, consideration should be given to mobile device users, and blind and partially sighted users, for which additional procedures will be necessary. 17 Usually, digitisation planning comprises deciding on the following phases: • Selection or acquisition (monograph publication, articles, serial publication, images, etc.). • IP clearance – the intellectual property rights should be respected and cleared out before starting the digitisation. (Models of IP clearance provided by EODOPEN WG5 and the IP clearance documentation tool developed by WG6). • Metadata definition – descriptive, structural, and administrative (technical, preservation, use, DRM) metadata. The technical metadata can be embedded in the image like filename, document file, application, date created, date file created, date modified, file size, dimensions in cm, resolution in dpi, etc. Descriptive and structural metadata should be saved in a software-neutral and standard-compliant form. • Digitisation requirements (way of access, OCR ground-truth required, delivery file formats (thumbnail and better-quality user access formats), bit depth, capture resolution, colour management, scanning technology, etc.). • File naming system – there are several methods of naming files resulting from digitisation. In principle, each file name must be unique and simple. It can be meaningful (with abbreviations) or non-descriptive. The general convention is that there should be no more than 8 characters in the file name, to use lowercase letters from the Latin alphabet and numerals 0-9 plus 3 characters for the file extension (*.tif, *.pdf). Alpha-numeric characters and dashes or underscores can be used, but no special characters like \/:*?”<>| or spaces. Example: [Call No./ID_vol._nr._page] • 13608448_19_4_001.jpg (1st page of 1st issue) • 13608448_19_4_002.jpg (2st page of 1st issue) • 13608448_19_4_003.jpg (3rd page of 1st issue) [type+ID_vol._nr./part_ Leaf_versus] • MS145_3_7_001 • MS145_3_7_002 • MS145_3_7_002_2 • MS145_3_7_003 … • Decision on in-house or outsourced scanning – this decision depends on the available equipment and staff for the implementation of these processes. • Long-term availability (data and metadata format, secure storage system) 3.2 Different phases of processing This is the production phase which comprises all phases needed for making an accessible copy of the publication. To access the digital content from different mobile devices and to enable access to blind and partially sighted users, it is necessary to generate a high-quality full-text with almost 100% of ground-truth. The European project IMPACT (2008-2012) developed 18 many tools for optical character recognition (OCR) of historical texts. They are available at The IMPACT Centre of Competence10. These tools are very effective with modern printed texts. The production phase consists of the next activities: 1. Image capturing (scanning) 2. Image processing 3. Full-text generation 4. Conversion to different delivery formats 3.2.1 Image capturing / scanning Scanning technology: different types and brands of scanners. When buying a scanner, it is important to have in mind that in some cases the processing software is more expensive than the scanner. Each library material has requirements for scanning (format size, …) and this is one of the elements that should be considered when purchasing the scanner. Scanning software (for capture, processing, and delivery): Scanners usually have the basic scanning software, but very often it does not provide enough functionalities for complete image post-processing. Some of high-quality scanning software is compatible with different brands of scanners. In addition, OCR processing tools are needed for full-text generation. Resolution (spatial resolution): According to Monson (2017, p. 69), “The spatial resolution of a digital image refers to the level of spatial detail that it contains, with higher resolution denoting greater detail, clarity, and sharpness. Resolution is expressed in pixels per inch (ppi) or the often-interchangeably used dots per inch (dpi); it is essentially a measurement of the density of pixels in a given area and is dependent on pixel size.” There is a difference between the input and output resolution. To achieve good OCR results, it is suggested to use at least 300 dpi or more. The resolution depends on the size of the original document and the size of the text print11. Bit or colour depth is the “… amount of storage space allocated to an individual pixel” in a digital image (Monson, 2017, p. 76). Most guidelines recommended a minimum of eight-bit grayscale and 24-bit colour (Monson, 2017). 24 bits per pixel are known as “true colour”. This is the bit depth detected by the human eye. It can contain 16,777,216 different colours in the range RGB. For preservation purposes, especially of images’ and old printed materials’ scanning the 48 bits could be the optimal solution. “Bitonal imaging may be acceptable for black-and-white textual documents that display good contrast between the printed text and the paper background, but grayscale is often preferred for all noncolor documents to capture the tonal range of the original.” (Monson, 2017, p. 76). 10 The IMPACT Centre of Competence: https://www.digitisation.eu/knowledge/library/succeed-training- materials/ 11 Higher than 300 dpi is preferred if the size of the font is small. Available form: https://support.abbyy.com/hc/en-us/articles/360017733239--Image-Resolution-What-are-the-optimal- settings-for-OCR 19 According to ABBYY (Weber, 2020) OCR is executed on a bi-tonal image (black and white). Therefore, for complex layouts, it is recommended to use colour or at least greyscale images that generate a good and suitable binary image. 3.2.2 Image processing and full-text generation Various types of software enable different post-processing functions to edit the images captured during scanning. Most often used are de-skewing, rotating, splitting the pages, cropping, fixing resolution, brightness and contrast, straightening text lines, as well as denoising. For achieving good results in the optical character recognition, the scanning resolution and bit depth should be considered, the brightness of the scan shouldn’t be too high, contrast too low, and the straightness of the page should be aligned to the text. Some additional problems may occur if a page is blurred or discoloured, and if pages are dirty or marked by previous users (e.g. study materials). Additional problems can arise with thin paper because the text from the other side is visible, or when a document contains decorative fonts. Image processing consists of different actions needed to make the optimal image for full-text generation and access. These are cropping, de-warping, touch-up, colour-adjustment, contrast adjustment, de-skewing (i.e. straightening out a crooked scan), denoising, etc. Some OCR tools, like Tesseract incorporate image processing operations. Moreover, tools specially developed for image processing, such as ScanTailor12 or Unpaper13, could be used. An important step in digitisation is OCR which converts scanned images into machine-readable text. The ODLIS dictionary defines OCR as a “process by which characters typed or printed on a page are electronically scanned, analysed, and if found recognizable based on appearance, converted into a digital character code capable of being processed by a computer. OCR eliminates the time-consuming process of re-keying information available in print, but results can be unpredictable if the scanned copy is imperfect or contains diacritical marks or unrecognizable characters” (Reitz, Joan M., 2002). It is a necessary process to enable full-text search so the visually impaired can access text with their assistive technologies. ABBYY FineReader is the most often used tool for OCR, which supports almost 200 languages. It does however support only limited number of languages in blackletter (fraktur) version, for example there is no support for Polish. Some of EODOPEN partners for modern texts OCR successfully use Tesseract14 and Transkribus15. The question is how many problems can be solved before digitisation (e.g. erasing users’ annotations), or later by checking OCR and to remove what is included in a publication. OCR check can be quite time-consuming, especially when manual corrections are required. Page segmentation is also an important aspect that occurs during OCR, and should be considered in order to define which elements (text, image, table etc.) should appear on each 12 ScanTailor: https://github.com/scantailor/scantailor 13 Unpaper: https://github.com/unpaper/unpaper 14 Tesseract: https://github.com/tesseract-ocr/tesseract 15 Transkribus: https://readcoop.eu/transkribus/ 20 page and in which order. It is especially relevant in multi-column or complex structured documents (e.g. textbooks). This is of the utmost importance for blind and partially sighted readers because assistive technologies read elements in a linear order. If the elements are not in the right order, it may confuse readers or even deter them from reading. Most often noticed problems when it comes to OCR 1. External influences (annotations, smudges, dirt, official stamps, etc.) (Figure 2 and Figure 3) Digitised content OCR results Figure 2: Example of user annotations and where text was marked with a pen and example of not correctly recognized math equation. Source: Šterk, Karmen. (1998). O težavah z mano, p. 82. Available at : http://www.dlib.si. Digitised content OCR results na Slovensku učiněný po vyjádření·pověřence informací, 2 a KNIHOVNA * 17 * V/ \V \ iO vA Figure 3: Example of the stamp affecting OCR. Source: Československá filmová společnost. Znárodněný film a pokusy o jeho "odnárodnění": k návrhu zákona, kterým se sahá na samu podstatu dekretu presidenta republiky o znárodnění filmu. (1947), p. 17. Available at: https://kramerius.lib.cas.cz/. 21 2. Structure of the text (columns, tables, schemas, etc.) (Figure 4 - Figure 6) Digitised content OCR results Figure 4: Another example where columns are not recognized correctly. In this case, columns were not even recognized, so the text is shown in straight lines and should be corrected. Source: Vrhovnik, Ivan. (1926). Gostilne v stari Ljubljani, p. 43. Available at: http://www.dlib.si. Digitised content OCR results Figure 5: Example where structure isn’t properly recognized. Source: 40 Jahre Stettiner Electricitäts-Werke. (1930), p. 17. Available at: https://www.digitale-bibliothek-mv.de/viewer/index/. Digitised content OCR results Figure 6: Example of incorrect text flow and relations between terms. Source: Vihalem, Ann. (1996). Marketing: hind, müük ja reklaam, p. 19. Available at: https://www.digar.ee/arhiiv/et. 22 3. Special characters (mathematical and chemical symbols, initials, …) (Figure 7 - Figure 10) Digitised content OCR results Figure 7: Example where curly brace causes troubles. Source: Busson, Paul and Leopold Stolz. (1923). Waldmärchen: ein Traumspiel in einem Aufzug, p. 1. Available at: https://diglib.uibk.ac.at/UIB/. Digitised content OCR results Figure 8: Example of a decorative initial. Source: Košutnik, Silvester. (1912). Veliki Vsevedež, p. 3. Available at: http://www.dlib.si . Digitised content OCR results Figure 9: Example where chemical formulas aren’t recognized correctly. Source: Schultz, Gustav (1900). Die Chemie des Steinkohlentheers: mit besonderer Berücksichtigung der künstlichen organischen Farbstoffe, p. 38. Available at: https://diglib.uibk.ac.at/ . 23 Digitised content OCR results ALGEBRAVALEMID(a±b)2=a2±2ab+b2(a+b+c)2- a2+b2+c2±2ab+2ac+2bc(a±b)3=a3±3a2b+3ab2±b3a2-b2=(a+b)(a- b)a3±b3=(a±b)(a2±ab+b2)aman=am+n’anbn=(ab)n...ulogcb..1l°gab- |Ogca.Iogab-|ogba Figure 10: Example of incorrect OCR of mathematical formulas. Source: Reimers, Elmar. (1988). Matemaatilise analüüsi praktikum. I, p. i. Available at: https://dspace.ut.ee/. 3.2.3 Post-Correction The OCR results can be corrected by statistical error modelling, language modelling, or word modelling. Not every software allows a post-correction process but Tesseract, for example, facilitates the use of dictionaries and word lists. 3.2.4 Conversion to different delivery formats Many lists of text delivery formats can be found on the web. A list of text formats for archival purposes is listed on the Congress Library web page16. At the IMPACT Centre of Competence17 recommendations regarding OCR and delivery formats could be found. These are part of the SUCCEED project’s recommendations for digitisation. For delivery purposes the following file formats are recommended: JPEG, PDF, JPEG2000 (JP2), EPUB, MOBI derived from EPUB. 3.3 Recommended delivery file formats This chapter presents a list of delivery file formats, into which digitised publications could be converted with their description of use, advantages and disadvantages. The focus is on the most often used delivery formats by the EODOPEN partners and preferred formats by their users. We follow the most commonly used formats and recommendations also preferred by the publishing industry; namely, we believe that these aspects should also be considered when modifying the format. In addition, we follow Succeed Recommendations (2014) and File Formats Assessments provided by Digital Preservation Coalition18 regarding file formats for online delivery that can be applied in our recommendations as well. These file formats should 16 Format Descriptions: https://www.loc.gov/preservation/digital/formats/fdd/descriptions.shtml 17 Recommendations on formats and standards useful in digitization: https://www.digitisation.eu/knowledge/library/recommendations-for-digitisation-projects/recommendations- formats-standards-recommendations/ 18 File Formats Assessement, DPC: https://wiki.dpconline.org/index.php?title=File_Formats_Assessments 24 be “easy to use and simple to display”. In regard to quickly changing technology and formats as well, we suggest that such developments should be constantly observed and adapted19. PDF (portable document format) Usage: Texts and images. ISO standardised. For universal accessibility it is best to use PDF/UA format with structured content20. Advantages: PDF format is the most popular among SUCCEED and EODOPEN survey respondents and it is also prevalent in the existing recommendations. Recent PDF format versions can be optimized to enable progressive download. It also supports multiple layers, therefore can be used for images or textual content, or both (Recommendations for metadata …, 2014). The format retains the visual presentation of a digitised publication. // The format is representing a printed page in a digital file. Disadvantages: PDF requires additional software tools for reading. It is not a responsive data format, and can’t be adjusted to different screen/letter sizes. For accessibility purposes, additional steps need to be undertaken, because the content has to be semantically tagged. JPEG Usage: texts and images Advantages: “JPEG format has been indicated by most of the existing recommendations (82%) and the majority of Succeed survey respondents (71%). It is a general-purpose image format which uses lossy compression to minimize the size of an image. JPEG is supported by almost all web browsers, including mobile ones” (Recommendations for metadata …, 2014) Disadvantages: Not suitable for preservation. The text is not accessible for full text search or for people with special needs. JPEG2000 (JP2) Usage: texts and images Advantages: support high-resolution images, in different resolutions, it can implement very high resolutions in some parts of the image (ROI – region of interest), it can store colour information at 48-bit colour depth, it can store metadata inside the file format, it supports IPR management (prevent downloading of the whole image). If the header is corrupted, there is freeware available to fix it. 19 For example: EPUB 3 is currently used version of the format but EPUB 4 is already planned and it “will be a specific profile for Portable Web Publications” (Pellegrino, 2018). 20 PDF/UA can be created using ABBYY FineReader, the accessibility can be checked for example with PDF accessibility checker PAC3. https://support.axes4.com/hc/en-us/articles/201957988-PDF-Accessibility-Checker- PAC- 25 Disadvantages: “It requires dedicated software tools to display in a user’s web browser, but there are already tools supporting such features (e.g. IIIF, OpenSeadragon). Due to such solutions it is possible to use production master files as a direct source for online delivery of digital content” (Recommendations for metadata …, 2014). The text is not accessible for full text search or for people with special needs. TXT (text file) Usage: pure textual format Advantages: the text is without formatting or design which is suitable for assistive technologies. Disadvantages: it allows only a linear reading, without any navigation and without any graphical elements. HTML (Hypertext Markup Language) Usage: text, images, videos and other visual elements Advantages: it is a standard mark-up language for web pages, constantly updated by the World Wide Web Consortium (W3C). Together with the use of CSS (Cascading Style Sheets), it enables a precise structure and display of the content which can be visually adjusted to meet the reader’s needs. The content can be accessed through browsers and no special software is needed for reading. Format is fully accessible for people with special needs if the content follows Web Content Accessibility Guidelines (WCAG). Disadvantages: if created from digitised content, it requires additional processing work. EPUB 3 (electronic publication) Usage: text, images, videos and other visual elements Advantages: it is an archive file consisting of XHTML (Extensible HyperText Markup Language) files, images and other supporting files. It is standard based since 2007 and similarly as HTML format, its creation is tightly connected to the work of W3C and changes in WCAG guidelines. The format enables precise structure and display of the content which can be visually adjusted to meet the reader’s needs. Additionally, it adjusts to the size of the screen (reflowable content) which is useful for devices with a small screen (e.g. smartphones). Format is fully accessible for people with special needs if the content follows Web Content Accessibility Guidelines (WCAG). Disadvantages: if created from digitised content, it requires additional processing work. For reading, additional software tools are required. Format is not supported by Kindle devices. 26 AZW/MOBI Usage: text, images, videos and other visual elements Advantages: The format enables precise structure and display of the content which can be visually adapted to meet the reader’s needs. Additionally, it adjusts to the size of the screen (reflowable content). The format is mostly used by e-readers but with the use of applications, the format can be read on any device. Disadvantages: if created from digitised content, it requires additional processing work. For reading on other devices than e-readers, additional software tools are required. MP3 Usage: audio format Advantages: the content is presented in audio files which are mostly used by people with special needs (blind, partially sighted, …) but also by other people for leisure, during traveling etc. Disadvantages: for creation of the format, special equipment is needed (recording studios, software …). Unless special hardware or software is used, the time where the reader has been paused is never saved. Files are often shown mixed and don’t follow the correct order. Not every reader of the audiobook is content with the voice of the person who recorded the audiobook. DAISY (The Digital Accessible Information System) Usage: audio format or audio and textual format Advantages: it is “an XML-based open standard published by the National Information Standards Organization (NISO) and maintained by the DAISY Consortium for people with print disabilities. DAISY has a wide international support with features for multimedia, navigation and synchronization”.21 “The DAISY Specification offers a flexible and navigable reading experience for people who are blind or print disabled”.22 The books in DAISY format “can include not just the audio rendition of the work, but also the full textual content and images”23. Navigation of the DAISY books can be structured in more detail than MP3 books. Another advantage is that DAISY format enables saving the time where the reader paused the audio file and also enables changes in the sound’s pitch or speed. Disadvantages: for creation of the format, special equipment is needed. For reading, DAISY requires specific software or hardware tools. 21 Wikipedia. (2021). Comparison of e-book formats: https://en.wikipedia.org/wiki/Comparison_of_e- book_formats 22 The DAISY Consortium. (n. d.). DAISY Format. https://daisy.org/activities/standards/daisy/ 23 Idem. 27 Additional note regarding audiobooks: W3C prepared specifications for born accessible audiobooks which are published as LPF (Lightweight Packaging Format). LPF is a package format (similar as EPUB) which contains not only mp3 files but also supporting files which enable the functionalities of a DAISY format but the production is faster and easier. The W3C recommendations are found in the chapter 5.5 Guidelines for creating audiobooks. 3.4 Additional phases of implementation for blind and partially sighted users 3.4.1 Creating accessible document For creating accessible documents, we need to be aware that each person has specific needs and that the content needs to be created in a way that allows visual adjustments and/or the use of synthetic voice. Visual adjustments can vary from person to person and also from day to day or from hour to hour, depending on what kind of eye-related condition a person has. Most often used visual adjustments are text size, text font, colour themes or colour inversion, various types of spacing (between lines, characters, words, paragraphs), and adjusting margins. At the same time, most often used assistive technologies for reading eBooks are screen reading software, braille display and magnifiers. The DAISY Consortium is a leading global organisation with a vision for equal access to information and knowledge, its mission is to develop global solutions for accessible publishing and reading. They stress that “… there are globally accepted standards and best practices for creating accessible digital content. Some of the most adopted standards are WCAG, Section 508, EPUB Accessibility, and PDF/UA”24. Further, according to consortium, it is important to follow the following objectives: • Creating a structured and navigable document • Provision of text descriptions for graphical content • Providing an adaptable format that is marked up semantically For creating formats from digitised publication, such as various Microsoft Word files, EPUB25, HTML, TXT, PDF, the best option is to start the work and make it fully accessible in Microsoft Word and then export it into a chosen format. We propose that the basic structure regarding creation of an accessible document is the following: • OCR check • design of the eBook • structure of the eBook and graphical elements • math and science or special symbols, accessibility check • exporting to different formats 24 The DAISY Consortium. (n. d.). Creating accessible Word documents. Available on 4 March 2022 at https://daisy.org/info-help/guidance-training/daisy-tools/creating-accessible-word-documents/ 25 EPUB can be made directly from Microsoft Word file with program which was created by DAISY consortium: WordToEpub which also has a Microsoft Word extension available. It enables quick and accessible creation of an EPUB. 28 3.4.1.1 OCR check The OCR results should be checked, preferably the whole document must be read through. If that is not possible, due to personnel shortages, lack of time or other reasons, then it is important to do a basic review of a few representative pages to find the most common mistakes and to identify sections which are most probably wrongly recognized. Once the problems are identified, it is not difficult to eliminate them in the entire document, and it shouldn’t be too time-consuming. When working in Microsoft Word, the mistakes can be quite easily fixed by using the Find and Replace function. Mistakes are also easier to spot if the entire document is marked in the language of the text. In this way, we can quickly find the most common mistakes that occurred during recognition for a chosen language. Most often, there are merged words (Slovenian example: ”seje” instead of ”se je”) or words or characters that are recognized wrongly (Slovenian example: ”rn” instead of ”m” or ”5” instead of ”S” etc.). Project partners also reported that Gothic (German) often causes problems with “a” and “o”. Also “s” is often referred to as “f”. With Antiqua, the problem occurs when “rn” is often referred to as “m”. When checking OCR, it is also important to: • remove parts that indicate broken lines from the original copy (example: - or ) • remove repeated spaces or tabulators ( , ). • change abbreviations to word equivalents (e.g. = for example; g = grams; Hz = Hertz etc.), in this way reading with screen reader is easier and more understandable. 3.4.1.2 Design of the document The look of the document should be as basic as possible, clear font with size between 12 and 14 pt. and without decorative fonts. Preferred fonts are Arial, Verdana, or Helvetica. The text should be left-aligned, in one column, and oriented the same through the whole document. Hold back on bold text, italic text, or text in caps. Contrast colours between text and background should be carefully considered. It is important to use styles to keep the visual design consistent throughout the whole book which later serves for the structure of the book. Most important are normal style and heading styles. It is also important to mark the document’s language and language of the sections which have different language26. It is recommended to use templates to keep the styling consistent through all adapting publications. 3.4.1.3 Structure of the document and graphical elements The document should have a logical reading order (especially pay attention if the original text has more than one column or a complicated structure). The previously mentioned use of styles enables easy creation of the table of contents which has active links, and thus enables a clear navigation. It is important to pay attention to footnotes and endnotes which should 26 When users use screen reading software the change in language also changes the synthetic voice. There is a difference if German text is read out loud with German synthetic voice or any other language. 29 work both ways (from the main text to the note and back). Since the new publication is connected to the digitised original, it is also important to add page numbers from the original, and as close to the original position (example: end of a sentence). Tables should be structured in such a way that there would be sufficient space around the edges of the cells. They shouldn't be shown as a picture. The header row has to be marked. If possible, avoid empty cells or merged cells which sometimes require reshaping the whole table. Graphical elements (pictures, graphs, etc.) should be placed in line with the text and left aligned. They should have original and numbered captions. Insufficiently described elements in the surrounding text, comprising crucial information for understanding the content, need to have alternative text or alt-text. The alt-text should be short, only a few sentences, or approximately 125-250 characters long. If possible, avoid using automatically generated alt-text27 because the AI behind it is still not developed enough to give an appropriate description of the images. If alt-text is not needed for the aforesaid reasons, it should be empty or marked as a decorative image. If the graphical elements (graphs, maps, infographics and similar) requires longer and more detailed description, add the longer description either: • immediately after the graphical element and clearly state that this is a longer description, • create endnote or footnote clearly stating that this is a longer description, • or in case the end format is EPUB, you can add a longer description within a collapsible element “details” for which you need some basic HTML knowledge28 3.4.1.4 Math and science or special symbols This section is more complex and there is no single solution for it. It is important that such elements are not presented as pictures, they should follow MathML (Mathematical Markup Language) or Latex. MathML “is intended to facilitate the use and re-use of mathematical and scientific content on the Web, and for other applications such as computer algebra systems, print typesetting, and voice synthesis. MathML can be used to encode both the presentation of mathematical notation for high-quality visual display, and mathematical content, for applications where the semantics plays more of a key role such as scientific software or voice synthesis”.29 MathML focuses on keeping the visual presentation for sighted users while the code in the background makes the equation accessible. Currently, this is the optimal solution to present math to all types of readers, particularly to blind and partially sighted because the reading systems respond to MathML in the same way as to normal text (see Figure 11), and the code enables browsing through each element of the math presented. 27 Microsoft Word enables automatically generated alt-text. 28 Detailed instructions are available at: http://kb.daisy.org/publishing/docs/html/details.html 29 Froumentin, M. (2019). Mathematical Markup Language (MathML). Available on 4 March 2022 at https://www.w3.org/Math/whatIsMathML.html 30 At present, not all reading systems support this function, but they are being developed to meet this need, too. Latex “is a document preparation system for high-quality typesetting. It is most often used for medium-to-large technical or scientific documents but it can be used for almost any form of publishing”30. Using Latex, any complex mathematical, chemical etc. formulas or presentations are written in a single line. When using it, it is also important to consider if blind and partially sighted are well enough acquainted with this kind of typesetting. It is also unpleasant to the eye for sighted users. When encountering an especially complex example where neither of the two are applicable, it is acceptable to present the element as a picture, and to provide alt-text or additional text with explanation. Example of a math formula in MathML format31: 𝑥 + 5 = 0 ⇨ x + 5 = 0 Example of a math formula in Latex format: 6 = 2 ⇨ $ \frac{6}{3} =2 $ 3 The application InftyReader32 could be used for the optical character recognition and translation of scientific documents (including math symbols) into LaTeX, MathML and XHTML. For manual work in Microsoft Word we suggest the inbuild library of mathematical expressions or the manual building system for equation which support MathML language. MathPix snipping tool33 is a is a utility that enables conversion of images. It can be helpful for working with more complex mathematical expressions in Microsoft Word (Figure 11). 30 The LaTeX Project (n. d.) An introduction to LaTeX. Available on 4 March 2022 at https://www.latex- project.org/about/ 31 Example taken from: https://www.drillster.com/info/mathml 32 InftyReader: http://www.inftyreader.org/ 33 MathPix snipping tool: https://mathpix.com/ 31 Figure 11: Example of math created with MathML, viewed in EPUB format with black theme activated 3.4.1.5 Accessibility check When a publication is drafted in Microsoft Word, it is important to use tools or software to check the extent to which the publication complies with accessibility standards, guidelines, and good practices. The first accessibility check should be done in Microsoft Word, and the second one after the conversion to different formats (EPUB, HTML, PDF…). The second accessibility check can be carried out in various ways, preferably by using at least two of the listed procedures: • use of software to check accessibility (examples: EPubCheck, Ace by DAISY), • manual accessibility check with the use of screen reading software and checklist, • accessibility check with a selected test group of users. 3.4.1.6 Exporting to different formats from Microsoft Word Once the work in Microsoft Word is done, the exporting is very simple and most formats produced in this way are highly accessible. Microsoft Word enables exporting into: PDF, HTML, TXT, RTF and doc or docx formats. For exporting into EPUB format, use the previously mentioned tool and Microsoft Word extension WordToEPUB34 which won the ALPSP Awards 34 WordToEPUB: https://daisy.org/activities/software/wordtoepub/ 32 for Innovation in Publishing 2020. The tool currently enables conversion to EPUB, HTML and MOBI format (Figure 12), and it is constantly developing additional features which will make document creation better and easier with each version. For further instructions, we propose to follow The DAISY Consortium, its guidelines35 for tools, free webinar sessions and its news updates. Below is an example of a publication made with this tool, viewed on various mobile devices with different visual adjustments and ability to use screen readers. Figure 12: Example of EPUB publication Skrb za slepce (en: Care for the blind people), created with tool WordToEPUB and viewed on different mobile devices with various visual adjustments. 3.4.2 Creating audiobooks (mp3, DAISY) Audiobooks are special eBooks formats, mainly created for persons unable to read printed publications independently those without access to digital documents (PDF, EPUB, etc.), or others who can’t or don’t know how to use them, and also for people who need sound for easier reading experience (e.g. dyslexia). Audiobooks, especially mp3 format files, are usually the easiest to access, and are used by people with disabilities because most hardware has already installed a software for listening to music; or a person simply uploads a book on a SD card or USB drive and plugs it into audio device. The main difference between mp3 and DAISY format is that the latter, which is also based on mp3 files, covers broader user needs, especially regarding the navigation through the sound files. Mp3 allows navigation through each file of the audiobook (usually the book is divided 35 WordToEPUB Guidance: https://daisy.org/info-help/guidance-training/daisy-tools/wordtoepub-guidance/ 33 into chapters); however, it also has a major disadvantage: if a user ceases listening in the middle of the file, the software doesn’t memorise the stop, so next time the user can’t continue from the stop, but only from the beginning of the file. On the contrary, the DAISY format is, to some extent, more difficult to access since a DAISY format supporting device (see Figure 13) or software (e.g. Kota daisy reader, Dolphin EasyReader, Pratsam, etc.) is needed. On the other hand, the user experience is more similar to the way of someone’s reading a printed book because it not only remembers where a user stopped reading, it also enables navigation through chapters, paragraphs, sentences, pages, etc. Users can also create bookmarks or use the bookshelf option for loading more than one book. The referring depth of navigation is determined by creation of the DAISY eBook. Fiction literature often requires navigation only between chapters and pages of the original publication while non-fiction or textbooks can have a more detailed structure. Original pages or footnotes can be marked as well. Figure 13: Example of DAISY reader hardware which comes in various variations (Source: https://store.humanware.com) The problems of audiobooks that users usually experience are related to the reader or the voice itself. DAISY books also enable adjusting the speed and the colour of the voice (e.g. reading speed is too fast and voice is too high). Especially in the case of human narrated books this option is very useful. Both, mp3 and DAISY books can be created with human narrated voice without any digital document needed, or by converting the text-based document into sound by using text-to-speech. However, it is important to be aware that blind and partially sighted people in general prefer a human narrated voice. With the development of speech synthesis which simulates human voices, it is expected that creating audiobooks with the use of text-to-speech will also become a possible workflow. But this process is probably not faster than human narrated audiobooks. It should also be considered whether developers of speech synthesis would allow their software to be used to produce audio books. It is also important to highlight the W3C recommendations for audiobooks which support the LPF format. 34 Using human narrated voice When creating audiobooks using a human narrated voice, a person reads and records the book out loudly in a preferably soundproof studio. It is important to note that not everyone can be a reader. Libraries for blind and partially sighted often have auditions for readers where many different aspects (pronunciation, speed, colour of the voice, foreign languages, etc.) are considered. Some of the most important factors at such auditions are: • reading speed, • loudness of voice, • clear pronunciation, • emphasis, • use of foreign languages, • disruptive factors (e.g. irregular breathing or other sounds), • level of interpretation (preferably low without role-playing or acting). The UK Association for Accessible Formats in their guidance documents36 note the importance of recording environment (building, disturbances, noise transfer etc.), the audio recording techniques, position of the recording device and similar. They also point out some basic fundamentals about the reader, what to avoid when reading, how to prepare for reading, If a library decides to produce human narrated audiobooks, there are some starting points in line with the UK Accessible Formats Association guidance to be considered in terms of the production environment, equipment and staff: • recording studio: the studio should be located in a peaceful part of the library to prevent noise, it should be big enough, with an option to regulate temperature, ventilation, to be soundproof (door, walls, floor, windows), appropriate lighting (e.g. no flickering), without any echo – thus the walls should be covered with acoustic foam, without any objects situated close to the wall, the room furnished with basic noise-free furniture (table, chair, additional light, reading stand); • hardware for recording: a computer for recording should be placed outside the studio to prevent any noises from the hardware and an additional good computer for editorial work, mouse and keyboard, speaker, headphones etc. Costs depend on the quality of the products used but at least a medium standard of equipment is desirable; • software for recording and editing: investing in better software to gain better results should be considered; • financial resources for readers: a person who reads in the studio is usually paid for the work and the cost can vary among amateurs and professional speakers; • financial resources for editors: a person who does the final checks of the audio, edits the sound, removes noises, makes sure sounds is through the whole book on a similar level and checks the files structure. Such person should have a higher knowledge on sound production; 36 UKAAF guidances: Audio recording techniques, Audio reading skills and Audio presentation skills. 35 • copyright consideration: audiobook is a new publication so copyright must be clarified before its production. • to avoid double productions, collaboration with the national library for the blind or other similar organisations is suggested. About structuring and producing audiobooks, here are some additional points that should be considered: • when choosing a reader, consider the content of the book; • a book should have colophon with basic information about the original book, copyright etc.; • if a book has table of the content, a separate audio file is created; • the book should be divided into chapters (one file-one chapter); • longer chapters, should be divided into more parts; • files shouldn’t be too long; • naming of the files needs to be consistent and meaningful (e.g. 003 Chapter 1 p. 5-15); • files must be named and tagged in the correct order so that all devices play the whole book in the correct order; • reader should always check microphone, audio and other settings before starting to read; • reader should always prepare and practice before recording; • reader should announce the start and end of each file (e.g. Chapter 1 -> End of Chapter one); • reader should read original page numbers (for leisure reading at start of each chapter or file, for study or more structured materials each page should be announced); • reader should describe the graphical material (e.g. images); • reader must remove any disturbing recorded noises (e.g. sound of paper when turning the page, mouse and keyboard clicks, sounds coming from the outside of the studio for example alarm, traffic etc.). The same mp3 files can then be used to create DAISY book with open source software like OBI37 (Figure 14) which is developed also by The DAISY consortium or other software like e.g. Dolphin Publisher. The process of making a DAISY book is somewhat longer because it needs more editorial work for splitting audio file sentence by sentence (this is done by the application but needs to be checked), applying the markers for chapters, pages, etc. The person who is editing should have practical experience and knowledge in working with audio files. 37 Obi: https://daisy.org/activities/software/obi/ 36 Figure 14: An example of DAISY audiobook production by using open source software OBI. On the left, there is the table of the contents, the audio of the selected chapter is divided sentence by sentence with markers for header and pages. 3.5 Using open source software for file format conversion eBook users have various needs due to different types of mobile devices, or depending on their special needs when it comes to blind and partially sighted users. Users themselves can convert the offered file format into a different one according to their needs. On the other hand, the file conversion could be done by libraries or other institutions, depending on the access format needed by the user. Thus, users don’t need conversion apps and can access to the format of their choice. It should be noted that the document to be converted has to be suitably prepared, the most important is to take into consideration structure and navigation. If the document to be converted is not properly structured, then the resulting document will have the same structure. Example: if conversion from PDF to EPUB is done on a PDF without marked heading, then the EPUB won’t have it as well. Which means - no navigation options, no table of contents etc. Thus, if we decide to convert one format to another, we need to be aware of the importance of the original file used for conversion. Below are listed some of the most used open source software which were also generally chosen by responders of both users’ surveys presented in chapter 2.3 A brief overview of the surveys’ results on special needs of users and technical requirements for digitisation 38. 38 Important to point out is also the use of not open source application ABBYY FineReader. Blind and partially sighted users in our survey placed it on the first place as the app that they use for conversion. Presumably, to access the full text when it is not available or to change a file format. Other converters that were pointed out are also freely available online. 37 3.5.1 Balabolka Balabolka is a Text-To-Speech (TTS) software which enables conversion of the text into audio files (Figure 15). It supports synthetic voices that are installed on a computer and supports conversion from various text file formats, for example: DOC, DOCX, EPUB, FB2, HTML, MOBI, PDF, RTF, etc. and exports it into wav, mp3, mp4 and some other formats. The program also “allows to alter a voice's parameters, including rate and pitch. The user can apply a special substitution list to improve the quality of the voice's articulation”.39 If the document was prepared following the steps under chapter 3.2.1 Creating accessible document then the same document can be converted into an audiobook by using the program Balabolka and screen reading software in your language. Before conversion, we recommend to do a second remediation by: • adding text which will announce new voice file (e.g. start of file 3) and at the end of the file (e.g. end of file 3) • adding alt-text inside the document where there is a picture. The alt-text should be announced and followed by a picture description. For example: “Alternative text for picture 3: Picture shows…” • writing each chapter title in capital letters in case of creating separate audio files per each chapter. The splitting can be done by recognizing the whole line of big letters • doing any other text remediation offered by the program: For example, to replace numbers with words etc. With these few steps and audio settings applied, the program creates a whole audiobook in mp3 format within minutes. Even if the process is faster than human narrated audiobooks, it is important to check the final outcome, delete any noise or return to the text to correct the deficiencies to get the best outcome. 39 Cross+A (s.a.) Balabolka. Available on 4 March 2022 at http://www.cross-plus-a.com/balabolka.htm 38 Figure 15: Example of Balabolka Interface for conversion (Source: http://www.cross-plus-a.com/bscrshot.htm) 3.5.2 Calibre Calibre is a “cross-platform open-source suite of e-book software (Figure 16). Calibre supports organizing existing e-books into virtual libraries, displaying, editing, creating and converting e-books, as well as syncing e-books with a variety of e-readers. Editing books is supported for EPUB and AZW3 formats.”40 The program is mostly used for text-based documents and does not support conversion into audio formats. Calibre supports a viewer with which a person can read eBooks, and adjust some of the basic features (font, size, margins etc.). One of the interesting features is also adjusting the text according to the size of the viewer window. It supports many file formats for conversion, it can be done into eighteen different formats, including the most popular ones: PDF, EPUB, MOBI, FB2, RTF, DOCX etc. Before conversion, a user inserts preferred settings, edits metadata, chooses the outcome format and continues with conversion. As noted before, the quality of the outcome depends on the quality of the input, thus we strongly suggest enabling a good input file. In addition, MOBI format can be converted into either EPUB or AZW3 format. 40 Wikipedia. (2021). Calibre (software): https://en.wikipedia.org/wiki/Calibre_(software) 39 Figure 16: Example of Calibre interface for conversion (Source: https://manual.calibre-ebook.com/zh_CN/conversion.html). 3.5.3 Robobraille “RoboBraille is an e-mail and web-based service, capable of automatically transforming documents into a variety of alternate formats for the visually and reading impaired.”41 (Figure 17). It offers braille, audio (mp3 or DAISY), e-book (conversion to EPUB and MOBI format) and accessibility services (conversion of various other formats). For audio conversion it supports over 20 languages. The process consists of two simple steps: uploading document/choosing url/, pasting text and selection of output format. According to the selected output format, the system offers some additional options. Last step is to insert an e-mail to which the converted material should be sent. 41 RoboBraille (s.a.) Introduction to RoboBraille. Available on 4 March 2022 at https://www.robobraille.org/introduction-robobraille/ 40 Figure 17: Example of Robobraille interface for conversion 41 3.6 Examples of implementation In Figure 18 the general workflow for delivery file formats is shown. Special delivery file formats require additional workflow (Figure 19). Figure 18: The workflow for delivery file format. 42 Figure 19: The workflow for special delivery file format 43 4 References McNaught, A. and Alexander, H. (2014). Ebooks and accessibility. In: Woodward, H. (ed.) Ebooks in Education: Realising the vision. P. 35-49. London: Ubiquity Press. DOI: http://dx.doi.org/10.5334/bal.e Monson, Jane D. (2017). Getting Started with Digital Collections: Scaling to Fit Your Organization. ALA Editions. Pellegrino, G. (2018). Ebook formats evolution: the state of the art and future of digital publications. Fondazione LIA. Available on 4 March 2022 at https://renodo.org/en/wp-content/uploads/2018/11/Ebook-formats-evolution.pdf Recommendations for metadata and data formats for online availability and long-term preservation. (2014). Succeed - The Support Action Centre of Competence in Digitisation. Available on 4 March 2022 at https://drive.google.com/file/d/1O4vTpDq2RJ6N4FQckjyx-NUshGKaFrxi/view Reitz, Joan M. (2002). Dictionary for library and information science. Available on 4 March 2022 at http://vlado.fmf.uni-lj.si/pub/networks/data/dic/odlis/odlis.pdf 5 Key relevant documents, guidelines and sources 5.1 Digitisation guidelines DFG Practical Guidelines on Digitisation. (2016). Deutsche Forschungsgemeinschaft (DFG). Available on 4 March 2022 at https://www.dfg.de/formulare/12_151/ Digital Preservation Coalition. (2015). Digital Preservation Handbook. Available on 4 March 2022 at https://www.dpconline.org/handbook/technical-solutions-and-tools/file- formats-and-standards Digital Preservation Coalition Wiki. (n. d.). File Formats Assessments. Available on 4 March 2022 at https://wiki.dpconline.org/index.php?title=File_Formats_Assessments Digitization best practices and recommendations. (2019). National heritage digitization strategy (NHDS). Available on 4 March 2022 at https://cnhds.files.wordpress.com/2019/05/nhds-digitization-best-practices-and- recommendations-2019.pdf Library of Congress Recommended Formats Statement 2021-2022. (2021). Library of Congress. Available on 4 March 2022 at https://www.loc.gov/preservation/resources/rfs/RFS%202020-2021.pdf Library of Congress. (3. 25. 2019). Sustainability of Digital Formats: Planning for Library of Congress Collections. Format descriptions. Available on 4 March 2022 at https://www.loc.gov/preservation/digital/formats/fdd/descriptions.shtml Recommendations on formats and standards useful in digitisation. Recommendations. (n. d.). Impact Centre of Competence. Available on 4 March 2022 at https://www.digitisation.eu/knowledge/library/recommendations-for-digitisation- projects/recommendations-formats-standards-recommendations/ 44 Technical Guidelines for Digitizing Cultural Heritage Materials. Creation of Raster Image Files (2016). Federal Agencies Digitization Guidelines Initiative. Available on 4 March 2022 at http://www.digitizationguidelines.gov/guidelines/FADGI%20Federal%20%20Agencie s%20Digital%20Guidelines%20Initiative-2016%20Final_rev1.pdf The Association for Library Collections and Technical Services (ALCTS). (2013). Minimum Digitization Capture Recommendations. Available on 4 March 2022 at https://www.ala.org/alcts/resources/preserv/minimum-digitization-capture- recommendations#books_with_images 5.2 OCR implementation guidelines The OCR-D project. (n. d.). Workflows. Available on 4 March 2022 at https://ocr- d.de/en/workflows.src.html Weber, E. (2020). Image Resolution: What are the optimal settings for OCR. Abby Help Center. Available on 4 March 2022 at https://support.abbyy.com/hc/en- us/articles/360017733239--Image-Resolution-What-are-the-optimal-settings-for- OCR. 5.3 Other technical guidelines The DFG has a very detailed document with technical guidelines (resolution, color depth, etc.) and advice that should be followed for digitization. The document DFG Practical Guidelines on Digitisation can be found here: https://www.dfg.de/formulare/12_151/12_151_en.pdf Also the "Initiative for Optical Character Recognition Development" has some guidelines on the ground truth creation for OCR tools like Tesseract as well as documentation on the PAGE format and the conversion to other formats like ALTO (The Ground-Truth-Guidelines): https://ocr-d.de/gt//trans_documentation/index.html 5.4 Guidelines for creating accessible documents Accessible Publishing Best Practices: Guidelines for Common EPUB Issues in Plain Language. (22. 8. 2019). National Network for Equitable Library Service (NNELS). Available on 4 March 2022 at https://www.accessiblepublishing.ca/wp- content/uploads/2019/08/AP- NNELS_Accessible_Publishing_Best_Practices_August_2019.pdf Books for all: a starter kit for accessible publishing in developing and least developed countries. (b. d.). Accessible Books Consortium. Available on 4 March 2022 at https://www.accessiblebooksconsortium.org/export/abc/abc_starter- kit_300616.pdf 45 Digital Publishing Toolkit Collective. (2015). From print to ebooks: a hybrid publishing toolkit for the arts. Institute of Network Cultures. Available on 4 March 2022 at https://networkcultures.org/blog/publication/from-print-to-ebooks-a-hybrid- publishing-toolkit-for-the-arts/ EBU clear print guidelines. (2016). European Blind Union. Available on 4 March 2022 at http://www.euroblind.org/sites/default/files/media/ebu-media/Guidelines-for- producing-clear-print.pdf Garrish, M. (2012). Accessible EPUB 3. Available on 4 March 2022 at http://mtdh.ruralinstitute.umt.edu/blog/wp- content/uploads/Accessible_EPUB_3_sized.pdf Gunn, D. (2016). Accessible eBook guidelines for self-publishing authors. Accessible Books Consortium; International Authors Forum. Available on 4 March 2022 at https://www.accessiblebooksconsortium.org/export/abc/abc_ebook_guidelines_for _self-publishing_authors.pdf Hilderley, S. (2013). Accessible publishing: best practice guidelines for publishers. The International Publishers Association; The Federation of European Publishers; The International Association of Scientific, Technical and Medical Publishers. Available on 4 March 2022 at https://www.accessiblebooksconsortium.org/publishing/en/accessible_best_practic e_guidelines_for_publishers.html Pellegrino, G., Mussinelli, C in Molinari E. (9. 2. 2019). EBooks for all: towards an accessible digital publishing ecosystem. Fondazione LIA. Available on 4 March 2022 at https://www.fondazionelia.org/node/432 5.5 Guidelines for creating audiobooks Audio presentation skills: Guidance from UKAAF. (2020). UK Association for Accessible Formats. Available on 4 March 2022 at https://www.ukaaf.org/wp- content/uploads/2021/02/G011-UKAAF-Audio-presentation-skills-September- 2020.pdf Audio reading skills: Guidance from UKAAF. (2020). UK Association for Accessible Formats. Available on 4 March 2022 at https://www.ukaaf.org/wp- content/uploads/2021/02/G012-UKAAF-Audio-Reading-Skills-September-2020.pdf Audio recording techniques: Guidance from UKAAF. (2020). UK Association for Accessible Formats. Available on 4 March 2022 at https://www.ukaaf.org/wp- content/uploads/2021/02/G010-UKAAF-Audio-recording-techniques-September- 2020.pdf Audiobooks: W3C Recommendation. (2020). The World Wide Web Consortium. Available on 4 March 2022 at https://www.w3.org/TR/audiobooks/ Specifications for the Digital Talking Book. (2012). National Information Standards Organization. Available on 4 March 2022 at https://groups.niso.org/apps/group_public/download.php/14650/Z39_86_2005r201 2.pdf 46 5.6 Further useful sources Digital Library Federation: https://www.diglib.org/ Digital Preservation Coalition: https://www.dpconline.org/ Digital Preservation Coalition Wiki: https://wiki.dpconline.org/index.php?title=Main_Page European blind union: http://www.euroblind.org/ Federal Agencies Digitization Guidelines Initiative: http://www.digitizationguidelines.gov/ Impact Centre of Competence: https://www.digitisation.eu/ Inclusive publishing: https://inclusivepublishing.org/ International Digital Publishing Forum: http://idpf.org/ National heritage digitization strategy (NHDS): https://nhds.ca/ OCR-D. DFG - funded Initiative for Optical Character Recognition Development: https://ocr- d.de/ Succeed: The Support Action Centre of Competence in Digitisation: https://www.succeed- project.eu/home The DAISY Consortium: https://daisy.org/ The DIAGRAM Center: http://diagramcenter.org/ The Library of Congress: https://www.loc.gov/preservation/resources/rfs/index.html The World Wide Web Consortium (W3C): https://www.w3.org/ Web Content Accessibility Guidelines (WCAG): https://www.w3.org/WAI/standards- guidelines/wcag/ 47 6 Vocabulary ALTERNATIVE TEXT (ALT-TEXT) – Alternative text provides a textual description for non-text content (pictures, graphics, diagrams …). ASSISTIVE TECHNOLOGIES – “… any item, piece of equipment, software program, or product system that is used to increase, maintain, or improve the functional capabilities of persons with disabilities.” (Source: ATIA, https://www.atia.org/home/at- resources/what-is-at/) BIT DEPTH – “… amount of storage space allocated to an individual pixel” in a digital image (Source: Monson, 2017, p. 76). The bit depth of a screen pixel determines the total number of colors that can be displayed (see color depth). (Source: PCMag.com, https://www.pcmag.com/encyclopedia/term/bit-depth). COLOUR DEPTH – “The number of bits used to hold a screen pixel. Also called "pixel depth" and "bit depth," the color depth is the maximum number of colors that can be displayed. True Color (24-bit color) is required for photorealistic images and video, and modern graphics cards support this bit depth.” (Source: PCMag.com, https://www.pcmag.com/encyclopedia/term/color-depth). DELIVERY FILE FORMAT – the final file formats accessed by the users. DENOISING – related to images includes different processes of noise reduction in order to provide more accurate and visually pleasing images. EBOOK – Usually, the term eBook refers to digitally born publications. However, we use the term of eBook especially referring to digital publications produced as a result of digital conversion, including formats for special needs (audiobooks), which is also the aim of EODOPEN project IMAGE CAPTURING – scanning. IMAGE PROCESSING – “Image processing is a method to perform some operations on an image, in order to get an enhanced image or to extract some useful information from it.” (Source: Digital Image Processing, University of Tartu, https://sisu.ut.ee/imageprocessing/book/1). MOBILE DEVICES – were mobile or smartphones, laptops, and tablet computers. PARTIALLY SIGHTED – “People who are partially sighted are not completely blind but are able to see very little.” (Source Cambridge Dictionaire, https://dictionary.cambridge.org/dictionary/english/partially-sighted). Use for visually impaired. PRINT DISABLED – “The term “print disabled” was coined by George Kerscher, Ph.D. around 1989 to describe persons who could not access print. He used it to refer to: A person who cannot effectively read print because of a visual, physical, perceptual, developmental, cognitive, or learning disability.” (Source: https://myblindspot.org/mbs-accessibility-defined/). RESOLUTION – “Resolution is expressed in pixels per inch (ppi) or the often-interchangeably used dots per inch (dpi); it is essentially a measurement of the density of pixels in a given area and is dependent on pixel size.” (Source: Monson, 2017, p. 69). 48 SCREENREADER – “Screenreaders perform a text to speech role, but also allow audio-only access to the menus and other features of the delivery platform” (McNaught and Alexander, 2014) SPATIAL RESOLUTION – “The spatial resolution of a digital image refers to the level of spatial detail that it contains, with higher resolution denoting greater detail, clarity, and sharpness.” (Source: Monson, 2017, p. 69). VISUALLY IMPAIRED – see partially sighted. TEXT TO SPEECH – “Text to speech is a mature technology that allows text on screen to be voiced by software. (McNaught and Alexander, 2014) 49 7 Used acronyms AI – artificial intelligence DAISY – Digital Accessible Information System EBU – European Blind Union. EOD – eBooks on Demand service provided by approx. 40 European libraries. EODOPEN – eBooks-On-Demand-network Opening Publications for European Netizens – European project cofinanced under Creative Europe program from 2019-2023. IP – intellectual property OCR – Optical Character Recognition W3C – World Wide Web Consortium WCAG – Web Content Accessibility Guidelines WHO – World Health Organisation 50