Would it be easier with LinkedData?

A couple of people have asked me if my experiments with Pronom and Fido would have been easier if Pronom had been available as RDF or LinkedData.  The short answer to this question is ‘no’.  Let me explain why.

Parsing the Pronom XML is actually very easy.  The schema is straightforward and easy to understand.  Having an entire format specification in one XML document ensured that I didn’t miss any bits – and it made exploring and understanding the underlying conceptual model very easy.  I didn’t need the parts; I only wanted the whole.  In addition, the whole is quite small.  At the moment, all of Pronom can be contained in about 730 XML documents – less than 700KB in a Zip file.  This is smaller than the PDF report that documents the signature syntax and algorithm. 

There are two trivial changes that could be made to Pronom and Droid that would have made the exercise much easier.

  1. Provide access to the Droid Signature file via a simple HTTP request.  Doing this would make it easier to fetch by hand or automatically.  It is hidden behind a web services interface, which requires several extra layers of technology and no discernable benefit. 
  2. Provide access to the Pronom XML documents via a single simple HTTP request, returning them as a Zip file.  Alternatively, a single page listing all of the format identifiers or URLs would be almost as good.

If the Droid signature information or Pronom format information had been available as LinkedData, I would have had easier access than using web services.  That would have been a small improvement.  It would have been harder, however, to retrieve all of the relevant parts and assemble them back together again.  It would also have been much harder to understand how all of the parts worked together and, ironically, it would have been much harder to understand the underlying conceptual model.

The LinkedData approach would be a much better fit if the Pronom information was larger (thousands of formats), or changed more rapidly (daily or weekly).  It might also be a reasonable fit for combining format information from multiple sources.   I do love LinkedData, but not for every job.

Recent comments

  • Thanks for the correction Gareth. I think that was probably my misunderstanding! Looking forward to...
    paul 1 day 2 hours ago
  • Hi Paul, thanks for the write-up. Just to clarify an aspect of my talk - it's the Autopsy front-end...
    garethknight 3 days 18 hours ago
  • And here's an update on the status of the UDFR from the LoC's excellent digital preservation blog,...
    andy jackson 2 weeks 5 days ago
  • Hi Johan and Andy,   I agree with you both that some formats are worse than others with this,...
    ecochrane 3 weeks 19 hours ago
  • I have to agree with Johan, in that this depends very much on the format in question. There have...
    andy jackson 3 weeks 21 hours ago

Follow Open Planets Foundation on: