The Open Planets Foundation (OPF) addresses core digital preservation challenges by engaging with its members and the community to develop practical and sustainable tools and services to ensure long-term access to digital content.

Some reflections on scalable ARC to WARC migration

The SCAPE project is developing solutions to enable the processing of very large data sets with a focus on long-term preservation. One of the application areas is web archiving where long-term preservation is of direct relevance for different task areas, like harvesting, storage, and access.

A Nailgun for the Digital Preservation Toolkit

Mentioned in various forums before, but not necessarily expanded upon within this community, the Nailgun client/server application removes the overhead of starting the Java Virtual Machine when running a Java application consecutive times. Given a large majority of the programs in the digital preservation toolkit are written in this programming language we should consider all of the optimizations that we can find. Nailgun enables us to reach a significant improvement in performance, and should be considered in future digital preservation workflows, if it is not being used already. This blog outlines the current performance issues with Java and provides an overview of how to get Nailgun up and running; giving baseline statistics as it goes to illuminate the descriptions provided.

A spot of Gardening: Weeding the Open Planets Foundation Format Corpus

Like a garden needs watering, it sometimes needs a little weeding too. I think that's where we've been recently with the Open Planets Foundation, Format Corpus on GitHub. In this blog I describe how I've remixed it to enable it to be used more flexibly moving forward. Hopefully leaving it in a position to be forked and consumed again by the wider digital preservation community.

SCAPE QA Tool: Technologies behind Pagelyzer - II Web Page Segmentation

Web pages are getting more complex than ever. Thus, identifying different elements from web pages, such as main content, menus, user comments, advertising among others, becomes difficult. Web page segmentation refers to the process of dividing a Web page into visually and semantically coherent segments called Blocks or Segments.