Watching Stars

I'm not an astronomer but if I were I'd probably get excited watching the birth of a star. What I do get excited about is being around to watch the creation and evolution of a digital preservation problem right here and now.

The story of the ebook is an old one, but electronic book readers arrived in 1998. The RocketBook and its rival the SoftBook must have been before their time because they did not survive (other than in the DNA of all modern ebook readers) but they started a format war that continues to rage today. Wikipedia lists some 27 ebook file formats. This list is pared down to 7 by ebookarchitects - a company offering conversion services - who describe the rest as "essentially defunct and useless".  This is roughly one format created and one format deprecated each year. How long will it be until these formats become unreadable? That we already have companies offering conversion services is itself indicative of a problem. In a situation not unlike the early days of word processing, we are witness to the rapid growth of a sector and its diverse formats, we're in the middle of the battle for supremacy, and we're already starting to bet on the winners. Will we be right? How long will it be until those old formats are long forgotten?

Even researching those older formats is an exercise in digital preservation. Reading around this post, I encountered a link to a Geocities page. Contemporary articles on the preservation of ebooks already contain broken links, albeit mostly publishers. Early companies have been swallowed up and who knows where their format documentation is now? (If you do, let me know!) PRONOM knows nothing of a RocketBook format (.rb is Ruby right?) file though Friday may solve that. All is not lost. Calibre is a very capable ebook reader that opens even those lost rb files including the pixelated cover images that make me nostalgic and MobileRead is a very comprehensive wiki on the subject, include a useful list of conversion tools - SCAPE components anyone?

File format divergence is one problem. Digital rights management (DRM) is another. Encryption is such a problem that in his report to the National Library of the Netherlands, Johan van der Knijff recommends that the Library "do not accept EPUB publications that contain digital rights management and encryption features". Here in the UK the detail of British electronic legal deposit legislation is still unclear, but the British Library guidance suggests it is unlikely we can mandate deposited formats and thus presumably protection mechanisms. This may be a problem we have to face but I don't know, and with the predicted demise of DRM for ebooks perhaps we'll never have to worry. Or perhaps our digital preservation systems just got one more sub-system more complex as we maintain keys for the books and provide secure reading environments for our readers?

While DRM is most likely a problem that will be solved with policy and licensing decisions, we may also be able to relax safe in the knowledge that all encryption seems to fail that the machines of the future will laugh at our attempts to protect our content. Even now it is possible if you know where to look to obtain software to remove DRM from ebook formats. I do not suggest you try, but perhaps we can make the case for preserving the cracking software alongside the book content?

And yet perhaps I am overstating the problem. I have heard people say they don't care if those old format ebooks are lost in time. The dial-up delivered SoftBook content is not unique and somewhere the paper copy is safe. It may also now be available as a modern, acid-free, preservation-grade EPUB(3?). And yet I can't help but wonder what would happen if we find in the digital research papers of an eminent scientist an encrypted antique ebook, the title of which is unidenifiable and yet this item has clearly influenced the invention of the faster than light drive? (The only readable thing I could find in a sample RocketBook which I don't think was encrypted (though it might've been compressed) file was "NUVO").

Johan's announcement of his report on EPUB writes "surprisingly little information is available on the format's suitability for archival preservation". I trawled the titles of all the papers ever presented at an iPRES conference and 'ebook' was not mentioned once. (It is possible I missed it?) Why, when we're in the midst of an emergent and exciting preservation problem, one we can hope to influence, is this the case? Maybe we're done? The world is settling on commercial variants of EPUB and Mobipocket and so the problem, as with modern word processing, has largely gone away - at least in the medium term? Maybe, in spite of the rapid growth of ebook markets, we've yet to see this new format enter the doors and the minds of large institutions - the kinds of places and people who worry about the long-term?

Maybe we're just not there yet? Maybe we're just distracted by the star being born?

It only seems right to leave you with another Rocket Book.

Comments

Gary McGath's picture

Pirates may turn out to be salvation of e-books under DRM. As people break DRM schemes and make unencrypted copies, they're preserving the books in a kind of unintentional (or in some cases maybe intentional) underground movement. In the US at least, you're on shaky legal ground if you even develop software to break DRM, and preserving it might run into the same issues, besides which you risk the possibility of malware.

Future preservation efforts may include a scramble to find and archive those illegal copies. It all feels a little like the end of Fahrenheit 451, where there's an underground movement to preserve books in people's memories.