Preserving electronic publications

I found myself sitting in the classics library of one of Perth’s beautiful private colleges the other day… staring at the 1875 edition of the Encyclopedia Britannica. Taking a volume off the shelf and flipping through the fragile and slightly yellowish pages brought about a feeling of nostalgia… I couldn’t help but wonder how 135 years from now — in 2045 — someone would be sitting in a virtual (digital) library and look through those unique books we are publishing today in electronic formats.

1875 Encyclopedia Britannica

Frank Romano (2002), published an interesting paper of E-Books and the Challenge of Preservation, in which he also points to the fact that “Libraries and information repositories face a continuing challenge in maintaining files … “.  He points to issues not only related to file formats, but also to storage hardware and media, as well as operating systems and further comments that over time even the data coding system and metadata might change. Romano, concludes his article with a bald statement “Libraries and other data repositories must take a more active role in shaping the future of e-publishing” and points out that at the time of writing little consideration was given to the preservation of electronic books.

Trends in digital preservation look  at PDF (PDF/A) as a transformation document format for combined textual and graphics documents as well as scanned images (National Archives, 2004). PDF/A is the ISO-standard for the long-term preservation of electronic documents. Web based content would most likely be preserved in Hypertext Markup Language (HTML) or XML. Both TIFF and JPEG are  referred to as preservation formats for graphics.

EPUB construct

It is pretty easy to break open any EPUB file and get to both content and meta-data. The screenshot above taken in PDFXML Inspector (available from Adobe Labs), clearly displays how EPUB content is stored. The nature of electronic publications is that its content is reflowable and can be read on different devices, much like we see with web-pages. This would mean that in preservation we would need to focus not on the display or look and feel of the book itself but on its content. Library and Archives Canada (2010) refers to EPUB as a recommended preservation and standard format that “addresses the content and presentation without digital rights management (DRM)”.  This statement not only eases my original concern about electronic book preservation, it also confirms the importance of a standardized format.

I would add to Romano’s statement that not only Libraries and other data repositories must take a leading step in shaping the future of electronic publishing, but publishers would need to take an equal role in choosing their digital publication formats.  It would be interesting to also review the impact DRM would have on digital preservation…


