Breaking down walls in digital preservation (part 2)

Here is part 2 of the digital preservation seminar which identified ways to break down walls between research & development and daily operations in libraries and archives (continued from Breaking down walls in digital preservation, part 1). The seminar was organised by SCAPE and the Open Planets Foundation in The Hague on 2 April 2014.

Web-Scale Data Mining for Digital Preservation

Recent years have seen an ever-increasing interest in developing Data Mining methods that allow us to find structured information of interest in very large collections of data ("Big Data"). In this complex and emerging field, the digital preservation community may play an interesting role: 

ARC to WARC migration: How to deal with de-duplicated records?

In my last blog post about ARC to WARC migration I did a performance comparison of two alternative approaches for migrating very large sets of ARC container files to the WARC format using Apache Hadoop, and I said that resolving contextual dependencies in order to create self-contained WARC files was the next point to investigate further.