The Open Planets Foundation (OPF) addresses core digital preservation challenges by engaging with its members and the community to develop practical and sustainable tools and services to ensure long-term access to digital content.

Breaking down walls in digital preservation (part 2)

Here is part 2 of the digital preservation seminar which identified ways to break down walls between research & development and daily operations in libraries and archives (continued from Breaking down walls in digital preservation, part 1). The seminar was organised by SCAPE and the Open Planets Foundation in The Hague on 2 April 2014.

Web-Scale Data Mining for Digital Preservation

Recent years have seen an ever-increasing interest in developing Data Mining methods that allow us to find structured information of interest in very large collections of data ("Big Data"). In this complex and emerging field, the digital preservation community may play an interesting role: 

ARC to WARC migration: How to deal with de-duplicated records?

In my last blog post about ARC to WARC migration I did a performance comparison of two alternative approaches for migrating very large sets of ARC container files to the WARC format using Apache Hadoop, and I said that resolving contextual dependencies in order to create self-contained WARC files was the next point to investigate further.

CSV Validator - beta releases

For quite some time at The National Archives (UK) we've been working on a tool for validating CSV files against user defined schema.  We're now at the point of making beta releases of the tool generally available (1.0-RC3 at the time of writing), along with the formal specification of the schema language.  The tool and source code are released under Mozilla Public Licence version 2.0.