Here is part 2 of the digital preservation seminar which identified ways to break down walls between research & development and daily operations in libraries and archives (continued from Breaking down walls in digital preservation, part 1). The seminar was organised by SCAPE and the Open Planets Foundation in The Hague on 2 April 2014.
People & knowledge are the keys to breaking down the walls between daily operations and digital preservation (DP) within our organisations. DP is not a technical issue, but information technology must be embraced as as a core feature of the digital library.
Recent years have seen an ever-increasing interest in developing Data Mining methods that allow us to find structured information of interest in very large collections of data ("Big Data"). In this complex and emerging field, the digital preservation community may play an interesting role:
In my last blog post about ARC to WARC migration I did a performance comparison of two alternative approaches for migrating very large sets of ARC container files to the WARC format using Apache Hadoop, and I said that resolving contextual dependencies in order to create self-contained WARC files was the next point to investigate further.
For quite some time at The National Archives (UK) we've been working on a tool for validating CSV files against user defined schema. We're now at the point of making beta releases of the tool generally available (1.0-RC3 at the time of writing), along with the formal specification of the schema language. The tool and source code are released under Mozilla Public Licence version 2.0.