The Open Planets Foundation (OPF) addresses core digital preservation challenges by engaging with its members and the community to develop practical and sustainable tools and services to ensure long-term access to digital content.

Preservation capabilities: How to assess? How to improve?

Digital Preservation is making certain progress in terms of tool development, progressive establishment of standards and increasing activity in user communities, but there is a wide gap of approaches to systematically assess, compare and improve how organizations go about achieving their preservation goals.

Droid file format identification using Hadoop

The DROID software tool is developed by The National Archives (UK) to perform automated batch identification of file formats by assigning Pronom Unique Identifiers (PUIDs) and MIME types to files. The tool uses so called signature files as a basis of information stemming from the PRONOM technical registry.

I am here presenting some considerations for using the tool on the Hadoop platform together with a performance evaluation of the job execution on a Hadoop cluster using the publicly available Govdocs1 corpus data set.