A Format Registry for SCAPE

In my previous post on formats, I ended up leaning towards a wait-and-see approach to format registry design. Unfortunately, I don’t really have that luxury. The SCAPE project needs to collect more format information to assist preservation planning and other processes. We even have some effort available to help build and/or fill a registry. But which registry should we try fill? Or should we go it alone and make a new one?

I don’t really want to recommend that SCAPE makes a new format registry if we can contribute efficiently to an existing one instead. But which registry? There are so many to choose from…

…and that’s just the ones developed by our own community! Some of the most well-used ones belong to the wider world, and contain almost exactly the same information…

While our registries do contain a reasonable amount of good data, we know we don’t have enough. Why are our own repositories not brimming with the information we need? Is it uncertainty about what the right information is? Are we unsure where best to invest our efforts, leading to a kind of well-meaning deadlock? Or are we simply failing to assign enough time and effort to this task?

The OPF’s proposed solution is to remove any publication bottlenecks via an entirely de-centralised ‘ecosystem’ approach. This implies that SCAPE should go ahead and do it’s own thing, but publish the information as linked data so it can be merged with the other sources. But given we already have an ecosystem of incompatible registries, I’m not convinced this really the best way forward. Perhaps it would make more sense to try and bridge the gaps between them?

5 comments

Jay Gattuso's picture
Jay Gattuso wrote 16 weeks 14 hours ago

Where do go from here?

Very interesting post Andy, thank you.

As I inferred in an earlier comment, we have been mulling over the exact function/purpose/shape/requirements of a Format Registry here in New Zealand.

We reached some similar conclusions as you; however, I thought it might be useful to reflect a little on the position we find ourselves in. Through our work on on teasing out some of the things that work, and some of the things that don’t, derived some general principles that we felt reflect the position we should adopt.

  • Transparency – All changes should be visible, and flagged prior to committal
  • History should be visible and referenceable
  • Everything should exportable in a structured interoperable format (CSV -> Linked data)
  • There should be a governance model that is transparent and accessible to those motivated to participate
  • The data should be structured in a declared and open way, and any deviation is only allowed via agreement with the governance process. The conceptual data structure is community owned and free of IP restrictions.
  • Modular design principles should be used to ensure that discrete record sets can be used in isolation of the rest of the registry (e.g. Format Registry is separate from the application library etc)

For us, we assert there is a fundamental need for a consistent and community agreed mechanism for ring fencing digital objects that share significant technical structures and (slightly less importantly) technical properties.

Perhaps now is the time to go back to first principles and encourage some discussion and agreement (or otherwise) on the founding principles that we should collectively be building on.

I think there are four classes of ‘player’ currently, and we all fit into at least one of these types, if not more.

  1. Those who are ‘holding out’ because they see format-registries as a service being a viable future revenue stream. (I will also include those content owners who hold back data on their propriety formats for commercial reasons).
  2. Those who can’t share because their local data is just not in a an accessible state, or there is a lack of resources to support this activity.
  3. Those who passively consume the offerings on the table, without being in a position to add more value to the core resources they are leveraging.
  4. Those who are not yet far enough down the road of running large scale operational digital preservation system to understand what their future needs might be. 

In terms of trying to get something valuable and worthwhile we can all use and reuse efficiently (i.e. what you seek for SCAPE. what we want for NLNZ etc), I think a large part of the challenge is in figuring out how we can overcome or address the limitations described above, and at minimum form a community wide set of principles that means we are all heading in the same direction.

I’ll leave you with this – is there a model of a functional international community that has come together altruistically to converge on a ‘standard’?

We are all waiting for something magical to appear from the ether. Perhaps its time we institutionally start to commit to a single open concept, and commit funding to support what is a costly and complex process, of defining a trustworthy registry of digital objects descriptions….

 

andy jackson's picture
andy jackson wrote 15 weeks 6 days ago

Standardisation communities…

One standardisation effort that springs to mind, mainly because I was involved in it, is a bunch of physicist (ILDG) agreeing on mark-up standards for sharing experimental results (called QCDml). As I remember it, the process there ran along the lines of one lead organisation proposing the schema, and then another attempting to implement it, with additional cycles of meeting/workshops where the design was reviewed by all parties and the feedback used to iterate the design.

Personally, I rather like the idea of the OPF hosting a working group similar to those supported by the W3C standards process, but combined with the IETF’s focus on “rough consensus and running code”.

mauricederooij's picture
mauricederooij wrote 16 weeks 1 hour ago

vocabulary

In essence I think it is necessary to start creating a basic vocabulary. Next to that we would need some kind of thesaurus to bridge the gap between the already existing registries.

A good starting point would be the draft vocabulary specification from TNA’s PRONOM linked data project.

gmcgath wrote 3 weeks 1 day ago

Decentralizing registries

GDFR was supposed to be a decentralized system, but it got bogged down in the complexities of implementing that sort of system. Perhaps it didn’t go far enough in decentralization; it assumed that all the repositories would be mirrors of one another and would be kept in sync.

A more viable alternative might be service software on which anyone could set up their own registry, with the capability of importing from any other registry. (Granting the right to do this might be a condition of the software license.) There are specialists in small pieces of the format world, and they could build up a good registry within their own area of expertise without worrying about other formats.

But do we want to build Yet Another Repository Service? I don’t think so, unless someone with serious clout is behind it, and the people with serious clout are mostly wrapped up in their own projects. So at this point I don’t really have an answer.

Jay Gattuso's picture
Jay Gattuso wrote 3 weeks 1 day ago

Then centralising record frameworks?

Does this suggest that a generic / homogenious file format record framework is the key?

If we can share library/registary content easily (through standardised description records) and derive a method of individualising/ uniquely indentify where contributions came from we’ve moved the game from being (externally – unless you run one) managed registaries to a ‘open’, ‘shared’, ‘opt-in’ approach.

We’ll need to bash out a few generalised descriptors, and lean on the repository suppliers to apdot the record structure, oh, and then individually/institutionally commit to keeping our records available or throw them into a big pot… 

Its all doable, the question is do we collectively have the appetite?

Please register or login to post a comment.

Recent comments

  • Thanks for the correction Gareth. I think that was probably my misunderstanding! Looking forward to...
    paul 1 day 2 hours ago
  • Hi Paul, thanks for the write-up. Just to clarify an aspect of my talk - it's the Autopsy front-end...
    garethknight 3 days 18 hours ago
  • And here's an update on the status of the UDFR from the LoC's excellent digital preservation blog,...
    andy jackson 2 weeks 5 days ago
  • Hi Johan and Andy,   I agree with you both that some formats are worse than others with this,...
    ecochrane 3 weeks 19 hours ago
  • I have to agree with Johan, in that this depends very much on the format in question. There have...
    andy jackson 3 weeks 21 hours ago

Follow Open Planets Foundation on: