Basic guide to writing new format signatures

I have attached a basic introduction to writing format signatures - this written to assist new colleagues, or anyone new to the idea of writing basic signatures for the PRONOM /use with DROID.

 

I hope that you find it useful, and if I've made any blunders please let me know!

Preservation Topics: 
AttachmentSize
how to write a sig file v1.1.pdf1.26 MB

Comments

Jay, that's a very useful guide.  However, there's a small problem with the description you give of the Shift elements in the DROID XML.  These are not the reversed positions of the bytes in the sequence - they are the closest distance to the end of a given byte value in the sequence (with 1 being the end position).  If you have a sequence in which the same byte value appears more than once, only the smallest position for that byte value will be given. 

This is very important to get right for versions of DROID earlier than 6, as it is what is used to safely search using the Boyer Moore Horspool algorithm.

However, for DROID 6, all the SHIFT elements and DEFAULTSHIFT elements are entirely ignored!  The software calculates these values itself from the sequence information provided.  So it is entirely legal to provide DROID 6 with signature files without any of these elements.  This makes it much simpler to hand-craft a signature if necessary (and is hopefully the beginning of separating the "how" droid searches for signatures from the "what" droid is looking for). 

(IMHO, the DROID signature file should simply provide the signatures as regular-expression like strings with a minimum of additional metadata, leaving it to DROID (or some other tool) how to process them efficiently.  I should probably add that proposal to the DROID 7 wiki...)

Awesome, Thanks Matt.

I struggled with that bit to be honest - its seems like its complex step, but actually its very simple if you just match what you can see in other examples - (the unordered layout of the existing shifted patterns doesn't help the novice user here either). I wanted to try and make it very clear, and cover the method as simply as possible, without going into the details too much. I did also completely skip the EOF aspect. 

Comments on board, I'll have a re-jig and flick up an updated version shortly.