During the past couple of weeks, there have been some thoughtful and well-informed discussions about Fido, Droid, Pronom, and file format identification in the comment stream of this blog. They make interesting reading.
In a recent comment, Shaun Zevin raises some points about the algorithmic complexity of the Droid and Fido pattern matching.
I’m not at the OPF Hackathon this week in the Netherlands, and I’ll admit to being slightly envious of those who are! The idea behind the Hackathon is to bring practioners and developers together for some intense exchange of goals and ideas, collect use cases, show each other tools and approaches, and do some quality coding.
Content is King. The key to a good file format registry is not software; it’s not user interface; it’s not governance. The key is content, content, content. We will all win if we have a registry whose content is usable, accurate, and comprehensive.
I have a challenge for developers in the digital preservation community: can we build a file format registry without building any new software systems at all?
Fido is a simple format identification tool for digital objects that uses Pronom signatures. It converts signatures into regular expressions and applies them directly. Fido is free, Apache 2.0 licensed, easy to install, and runs on Windows and Linux. Most importantly, Fido is very fast.
A couple of people have asked me if my experiments with Pronom and Fido would have been easier if Pronom had been available as RDF or LinkedData. The short answer to this question is ‘no’. Let me explain why.
Last time, I discussed Pronom and Droid. We had a quick look at the compiled (nearly unreadable) pattern information that the Droid signature file holds and the uncompiled (but still hard to read) representation that is stored in Pronom.
Pronom and Droid, developed primarily at the National Archives (TNA) of the United Kingdom, have been a key contribution to the digital preservation community. Pronom is a registry of information about file formats. The TNA provides access to the Pronom registry on-line at http://www.nationalarchives.gov.uk/PRONOM and maintains the information. Droid is a software application that uses some of the file format information to identify the type of specific digital objects.