We have just set up a vagrant environment for C3PO. It starts a headless vm where the C3PO related functionalities (Mongodb, Play, a downloadable commandline jar) are managable from the host's browser. Further, the vm itself has all relevant processes configured at start-up independently from vagrant, so it can be, once created, downloaded and used as a stand-alone C3PO vm. We think this could be a scenario applicable to other SCAPE projects as well. The following is a summary of the ideas we've had and the experiences we've made.

New QA tool for finger detection on scans

I would like to draw your attention to the new QA tool for finger detection on scans: This tool was developed by AIT in scope of the SCAPE project.


Checking to identify fingers on scan manually is a very time-consuming and error-prone process. You need a tool to help you: Fingerdet.

Fingerdet is an open source tool which:

The SCAPE Project video is out!

Do you want a quick intro to what SCAPE is all about?

Then you should watch the new SCAPE video!

The video will be used at coming SCAPE events like SCAPE demonstration days and workshops and it will be available on Vimeo for everyone to use. You can help us to disseminate this SCAPE video by tweeting using this link


Introducing Flint

Hi, this is my first blog post in which I want to introduce the project I am currently working on: Flint.


Flint (File/Format Lint) has developed out of DRMLint, a lightweight piece of Java software that makes use of different third party tools (Preflight, iText, Calibre, Jhove) to detect DRM in PDF-files and EPUBs. Since its initial release we have added validation of files against an institutional policy, making use of Johan’s pdfPolicyValidate work, restructured it to be modular and easily extendible, and found ourselves having developed a rather generic file format validation framework.  

SCAPE Demo Day at Statsbiblioteket

Statsbiblioteket (The State and University Library, Aarhus, hereafter called SB) welcomed a group of people from The Royal Library, The National Archives, and Danish e-Infrastructure Cooperation on June 25, 2014. They were invited for our SCAPE Demo day where some of SCAPE’s results and tools were presented. Bjarne S.

Will the real lazy pig please scale up: quality assured large scale image migration

Authors: Martin Schaller, Sven Schlarb, and Kristin Dill

In the SCAPE Project, the memory institutions are working on practical application scenarios for the tools and solutions developed within the project. One of these application scenarios is the migration of a large image collection from one format to another.

Webinar: Tools for uncovering preservation risks in large repositories

An important part of digital preservation is analysing content to uncover the risks that hinder its preservation. This analysis entails answering diverse questions, for example: Which file formats do I have? Are there any invalid files? Are there any files violating my defined policies?; and many others.
The threats to preserving content come from many distinct domains, from technological to organizational, economical and political, and can relate to the content holder, the producers or to the target communities to which the content is primarily destined for.
Scout, the preservation watch system, centralizes all the necessary knowledge on the same platform, cross-referencing this knowledge to uncover all preservation risks. Scout automatically fetches information from several sources to populate its knowledge base. For example, Scout integrates with C3PO to get large-scale characterization profiles of content. Furthermore, Scout aims to be a knowledge exchange platform, to allow the community to bring together all the necessary information into the system. The sharing of information opens new opportunities for joining forces against common problems.
This webminar demonstrates how to identify preservation risks in your content and, at the same time, share your content profile information with others to open new opportunities.
Learning outcomes
In this webinar you will learn how to:
  • characterise collections and use C3PO to easily inspect the content characteristics
  • integrate C3PO with Scout and publish content profiles online
  • use Scout to automatically monitor your content profile
  • monitor preservation risks by cross referencing your content profile with policies, information from the world, and even content profiles from peers
There are 23 places available on a first come, first service basis. 
Date: Thursday 26 June
Time: 14:00 BST / 15:00 CET
Duration: 1 hour
Session Lead: Luis Faria, KEEP SOLUTIONS
26 June 2014
A Weekend With Nanite

Well over a year ago I wrote the ”A Year of FITS”( blog post describing how we, during the course of 15 months, characterised 400 million of harvested web documents using the File Information Tool Kit (FITS) from Harvard University. I presented the technique and the technical metadata and basically concluded that FITS didn’t fit that kind of heterogenic data in such large amounts. In the time that has passed since that experiment, FITS has been improved in several areas including the code base and organisation of the development and it could be interesting to see how far it has evolved for big data. Still, FITS is not what I will be writing on today. Today I’ll present how we characterised more than 250 million web documents, not in 9 months, but during a weekend.