Representation Information Registries: OPF and UDFR

On 13 and 14 April 2011 I took part in the UDFR Stakeholder meeting in Washington DC.  The UDFR team had invited around 25 people, mainly from the institutions represented on the UDFR governing body, plus a few others including me, representing the National Archives of the Netherlands and OPF. 

The UDFR team presented their progress and plans for the project and invited feedback and what turned out to be a lively discussion.

Apologies for taking so long to write this up for the OPF blog – holidays and other commitments have got in the way the last couple of weeks.


For the details of the meeting, take a look at the documents on the UDFR Wiki. To summarise very briefly, in January 2012, the UDFR project team plans to deliver a representation information registry system with the following main features:

  • Any registered user can contribute information
  • There will be a user interface for adding, viewing and maintaining the information
  • Detailed provenance will be provided for each item of information in the system
  • There will be a review process, but unreviewed information will also be stored and provided (and the provenance info will help you decide how much trust to place in it)
  • The information will be published in RDF as Linked Data, and the native data storage will also be as RDF
  • There will be a SPARQL endpoint for querying the data
  • Versioning of the data will be supported
  • The details of the data model are still in development, but are likely to be at least mostly compatible with the TNA PRONOM data model.
  • A feed of updates to the repository will be provided
  • The software produced will be open source

How does this match up against the ‘registry ecosystem’ described in the OPF paper we published a couple of months back, “A New Registry for Digital Preservation: Conceptual Overview” ?

My feeling is that it fits very well and it’s great to see the community starting to come together around a common vision of how the next generation of file format registries should look.

Probably the most important motivation for the ‘new registry’ work is the need for the work of researching and documenting file formats to be shared among many institutions – which in turn needs an effective system for sharing the new information produced.

The UDFR system is intended to support multiple contributors from multiple institutions and so in itself offers to provide a significant part of the solution.  However they also recognize the important fact that there will be other registries with similar information and are considering how to work alongside those.  In particular the Linked Data version of TNA’s PRONOM system is expected to be ready some time around September 2011 (there is already a working prototype).

With both UDFR and TNA planning to use Linked Data, two significant contributions are on the way for the kind of registry ecosystem we described.

There are still missing parts of the puzzle however, and the OPF plans to work towards providing those.  Firstly, we need guidelines for an organization that wants to set up its own representation information registry.  (Such an organization might choose to use UDFR, but it’s important for the diversity of the ecosystem that other options are also possible). 

Then an organization that runs a digital repository needs a convenient way to access information from multiple registries and use that to determine their own preservation policies and preservation workflows.  In the Conceptual Overview paper we described the outline of an ‘Ecosystem dashboard’ to support this.

So the next step for the OPF is to work on the Guidelines and the design of the Dashboard (leading on to building a first version of the Dashboard).  We plan to publish documents on each of those this summer and will consult widely for input.  Canvassing a wide range of views is especially important for the Guidelines: to ensure we can meet a broad range of requirements and gather a high degree of support.  The active development projects of UDFR and TNA will be very useful inputs to this, but we’ll also be talking to many others.  We’ll keep you up to date on progress via the OPF blogs.