Jay Gattuso's blog

Investigating PRONOM EOF patterns and DROID 'Fast' Scanning

Following on from the interesting discussion in my last post about the jpeg signatures, I undertook some quick testing on the impact of using / not using the EOF sections of a DROID signature file.

I previously posted this signature file here: http://dl.dropbox.com/u/59534857/DROID_SignatureFile_V55%20-%20no%20EOF....

 

can we talk about fmt/42, fmt/43 and fmt/44?

In a relatively recent signature update, the fmt/44 signature was updated to in allow some data after the stated EOF marker (ff d9).

In the case that started this off, a number of fmt/44 jpg files were found that had a couple of bytes after what DROID looks for as an absolute EOF.

I had a look into the specs for jpg, trying to unravel this story - were the extra bytes useful to someone? were we missing something by ignoring these bytes?

FMT 7,8,9,10

I wonder if many people have had a chance to dig around in the latest signature release for DROID?

There is an interesting format change that I think warrants some discussion.

As I understand it, historically tif has been a problem for DROID based ID. Pronom has records for TIFF versions 3, 4, 5, and 6, but the signatures all matched the same hex strings, meaning that the best DROID can do is to return a multiple match to the corresponding PUIDs.

Scenario for discussion: Text files.

I would like to pose a scenario for your comment:

Description 

A large set of files, ~5,000.

Created between ~1993 to ~1997

Creation software unknown

Given extension .ASC

PRONOM PUIDs:

x-fmt/22 (7-bit ASCII Text) and x-fmt/283 (8-bit ASCII Text) relate DROID matches by extension as above

JHove: ASCII-hul (Status: Well-Formed and valid)