In my previous post on formats, I ended up leaning towards a wait-and-see approach to format registry design. Unfortunately, I don’t really have that luxury. The SCAPE project needs to collect more format information to assist preservation planning and other processes. We even have some effort available to help build and/or fill a registry. But which registry should we try fill? Or should we go it alone and make a new one?
I don’t really want to recommend that SCAPE makes a new format registry if we can contribute efficiently to an existing one instead. But which registry? There are so many to choose from…
…and that’s just the ones developed by our own community! Some of the most well-used ones belong to the wider world, and contain almost exactly the same information…
While our registries do contain a reasonable amount of good data, we know we don’t have enough. Why are our own repositories not brimming with the information we need? Is it uncertainty about what the right information is? Are we unsure where best to invest our efforts, leading to a kind of well-meaning deadlock? Or are we simply failing to assign enough time and effort to this task?
The OPF’s proposed solution is to remove any publication bottlenecks via an entirely de-centralised ‘ecosystem’ approach. This implies that SCAPE should go ahead and do it’s own thing, but publish the information as linked data so it can be merged with the other sources. But given we already have an ecosystem of incompatible registries, I’m not convinced this really the best way forward. Perhaps it would make more sense to try and bridge the gaps between them?
Where do go from here?
Very interesting post Andy, thank you.
As I inferred in an earlier comment, we have been mulling over the exact function/purpose/shape/requirements of a Format Registry here in New Zealand.
We reached some similar conclusions as you; however, I thought it might be useful to reflect a little on the position we find ourselves in. Through our work on on teasing out some of the things that work, and some of the things that don’t, derived some general principles that we felt reflect the position we should adopt.
For us, we assert there is a fundamental need for a consistent and community agreed mechanism for ring fencing digital objects that share significant technical structures and (slightly less importantly) technical properties.
Perhaps now is the time to go back to first principles and encourage some discussion and agreement (or otherwise) on the founding principles that we should collectively be building on.
I think there are four classes of ‘player’ currently, and we all fit into at least one of these types, if not more.
In terms of trying to get something valuable and worthwhile we can all use and reuse efficiently (i.e. what you seek for SCAPE. what we want for NLNZ etc), I think a large part of the challenge is in figuring out how we can overcome or address the limitations described above, and at minimum form a community wide set of principles that means we are all heading in the same direction.
I’ll leave you with this – is there a model of a functional international community that has come together altruistically to converge on a ‘standard’?
We are all waiting for something magical to appear from the ether. Perhaps its time we institutionally start to commit to a single open concept, and commit funding to support what is a costly and complex process, of defining a trustworthy registry of digital objects descriptions….
Standardisation communities…
One standardisation effort that springs to mind, mainly because I was involved in it, is a bunch of physicist (ILDG) agreeing on mark-up standards for sharing experimental results (called QCDml). As I remember it, the process there ran along the lines of one lead organisation proposing the schema, and then another attempting to implement it, with additional cycles of meeting/workshops where the design was reviewed by all parties and the feedback used to iterate the design.
Personally, I rather like the idea of the OPF hosting a working group similar to those supported by the W3C standards process, but combined with the IETF’s focus on “rough consensus and running code”.
vocabulary
In essence I think it is necessary to start creating a basic vocabulary. Next to that we would need some kind of thesaurus to bridge the gap between the already existing registries.
A good starting point would be the draft vocabulary specification from TNA’s PRONOM linked data project.
Decentralizing registries
GDFR was supposed to be a decentralized system, but it got bogged down in the complexities of implementing that sort of system. Perhaps it didn’t go far enough in decentralization; it assumed that all the repositories would be mirrors of one another and would be kept in sync.
A more viable alternative might be service software on which anyone could set up their own registry, with the capability of importing from any other registry. (Granting the right to do this might be a condition of the software license.) There are specialists in small pieces of the format world, and they could build up a good registry within their own area of expertise without worrying about other formats.
But do we want to build Yet Another Repository Service? I don’t think so, unless someone with serious clout is behind it, and the people with serious clout are mostly wrapped up in their own projects. So at this point I don’t really have an answer.
Then centralising record frameworks?
Does this suggest that a generic / homogenious file format record framework is the key?
If we can share library/registary content easily (through standardised description records) and derive a method of individualising/ uniquely indentify where contributions came from we’ve moved the game from being (externally – unless you run one) managed registaries to a ‘open’, ‘shared’, ‘opt-in’ approach.
We’ll need to bash out a few generalised descriptors, and lean on the repository suppliers to apdot the record structure, oh, and then individually/institutionally commit to keeping our records available or throw them into a big pot…
Its all doable, the question is do we collectively have the appetite?