FMT 7,8,9,10

I wonder if many people have had a chance to dig around in the latest signature release for DROID?

There is an interesting format change that I think warrants some discussion.

As I understand it, historically tif has been a problem for DROID based ID. Pronom has records for TIFF versions 3, 4, 5, and 6, but the signatures all matched the same hex strings, meaning that the best DROID can do is to return a multiple match to the corresponding PUIDs.

There has been some discussion over the past 12 months (probably more) about how to derive a single PIUD.

It looks like the answer has been to depreciate fmt 7/8/9/10 in favour of a single tiff based PIUD that uses the single signature used across fmt 7/8/9/10 previously.

I can see the logic, and actually the approach is something we have been doing in New Zealand for a while, resolving the multiple PUIDs into a single identifier.

But I think it was a mistake to depreciate fmt 7/8/9/10, and hopefully these graphics will explain why. 

Prior to the change, we can visualise the TIFF PRONOM descriptors like this: http://imgur.com/799FQ.jpg

There is a parent class with no PRONOM label, and hierarchical links progressing through the format versions (via the [Related file formats] tags in PRONOM).

The move to depreciate fmt 7/8/9/10 means we can visualise the TIFF PRONOM descriptors like this: http://imgur.com/eRqmG.jpg

 

 

There is now a single class, with no lower level descrpitor for TIFF version.

 

I propose that the change should have been made along the lines of: http://imgur.com/9wafA.jpg

 

 

The change I propose is simple. We ‘re-preciate’ fmt 7/8/9/10, but keep the new higher class of general TIFF descriptor, fmt/353.

 

I justify this proposal by saying that the change to PRONOM seems to be more related to the act of reliably identifying discrete PUIDs contained as a record in PRONOM.

I would argue that we still need the ability to describe TIFF objects at a version level, but are not able to do so anymore via the ‘language’ of PRONOM.

This is a loss of information, and I hope is something that is revisited. There are some large implications for many of us who use both TIFF files, and PRONOM as the language of format description.

There is much work to be done with the TIFF format, no least to understand how we describe ‘side data’ sufficiently, and where we draw a boundary around format variation. I feel that to simply move to a single descriptor for a whole class of format that we all use is troublesome, and is moving further away from a adequate level of format description.

I really don’t want this to appear as a rant against PRONOM, I have huge amount of respect and appreciate for the efforts that go into making this data available to us, however I do think we should be having a slightly broader conversation about the persistence of PRONOM records…

 

 

1 comment

andy jackson's picture
andy jackson wrote 5 weeks 3 days ago

Inheritance or something…

It’s also worth noting that adding some sort of inheritance or other parent/child relation to PRONOM could also help normalise the data somewhat. There is an awful lot of repetition between the PRONOM records for similar formats, and a Format object that inherits/overrides properties from a Parent object would make the data a lot neater. It would, of course, making using the data slightly more complicated, as any viewer would have to examine all the parents to determine all the fields.

And to go back to my earlier rant, this change appears to arise from some confusion as to whether a format is defined by an internal signature or by a specification. If PRONOM was just a format identification system (i.e. was just DROID), and PUIDs were just being minted for internal signatures, then this change would make perfect sense. They have deprecated the old, degenerate signatures and replaced them with a new one. However, if PUIDs refer to format specifications or implementations, which is how many of us see it, then this change seems quite strange.

FWIW, in Planets, we side-stepped this parent/child issue by allowing formats to be specified as URIs corresponding to MIME types, file extensions, or PUIDs, and leveraging the DROID signature file to map between the different identification schemes. In my opinion, this identification stack, i.e. the set of internal signatures and the identifiers they can be mapped to, should be kept in a separate system from the rest of the format RI (which are linked via the identifiers).

Please register or login to post a comment.

Recent comments

  • Thanks for the correction Gareth. I think that was probably my misunderstanding! Looking forward to...
    paul 1 day 2 hours ago
  • Hi Paul, thanks for the write-up. Just to clarify an aspect of my talk - it's the Autopsy front-end...
    garethknight 3 days 18 hours ago
  • And here's an update on the status of the UDFR from the LoC's excellent digital preservation blog,...
    andy jackson 2 weeks 5 days ago
  • Hi Johan and Andy,   I agree with you both that some formats are worse than others with this,...
    ecochrane 3 weeks 19 hours ago
  • I have to agree with Johan, in that this depends very much on the format in question. There have...
    andy jackson 3 weeks 21 hours ago

Follow Open Planets Foundation on: