SCAPE

Introducing Flint

Hi, this is my first blog post in which I want to introduce the project I am currently working on: Flint.

history

Flint (File/Format Lint) has developed out of DRMLint, a lightweight piece of Java software that makes use of different third party tools (Preflight, iText, Calibre, Jhove) to detect DRM in PDF-files and EPUBs. Since its initial release we have added validation of files against an institutional policy, making use of Johan’s pdfPolicyValidate work, restructured it to be modular and easily extendible, and found ourselves having developed a rather generic file format validation framework.  

SCAPE Demo Day at Statsbiblioteket

Statsbiblioteket (The State and University Library, Aarhus, hereafter called SB) welcomed a group of people from The Royal Library, The National Archives, and Danish e-Infrastructure Cooperation on June 25, 2014. They were invited for our SCAPE Demo day where some of SCAPE’s results and tools were presented. Bjarne S.

Will the real lazy pig please scale up: quality assured large scale image migration

Authors: Martin Schaller, Sven Schlarb, and Kristin Dill

In the SCAPE Project, the memory institutions are working on practical application scenarios for the tools and solutions developed within the project. One of these application scenarios is the migration of a large image collection from one format to another.

Webinar: Tools for uncovering preservation risks in large repositories

Overview
An important part of digital preservation is analysing content to uncover the risks that hinder its preservation. This analysis entails answering diverse questions, for example: Which file formats do I have? Are there any invalid files? Are there any files violating my defined policies?; and many others.
 
The threats to preserving content come from many distinct domains, from technological to organizational, economical and political, and can relate to the content holder, the producers or to the target communities to which the content is primarily destined for.
 
Scout, the preservation watch system, centralizes all the necessary knowledge on the same platform, cross-referencing this knowledge to uncover all preservation risks. Scout automatically fetches information from several sources to populate its knowledge base. For example, Scout integrates with C3PO to get large-scale characterization profiles of content. Furthermore, Scout aims to be a knowledge exchange platform, to allow the community to bring together all the necessary information into the system. The sharing of information opens new opportunities for joining forces against common problems.
 
This webminar demonstrates how to identify preservation risks in your content and, at the same time, share your content profile information with others to open new opportunities.
 
Learning outcomes
In this webinar you will learn how to:
  • characterise collections and use C3PO to easily inspect the content characteristics
  • integrate C3PO with Scout and publish content profiles online
  • use Scout to automatically monitor your content profile
  • monitor preservation risks by cross referencing your content profile with policies, information from the world, and even content profiles from peers
There are 23 places available on a first come, first service basis. 
Date: Thursday 26 June
Time: 14:00 BST / 15:00 CET
Duration: 1 hour
Session Lead: Luis Faria, KEEP SOLUTIONS
Date: 
26 June 2014
Event Types: 

A Weekend With Nanite

Well over a year ago I wrote the ”A Year of FITS”(http://www.openplanetsfoundation.org/blogs/2013-01-09-year-fits) blog post describing how we, during the course of 15 months, characterised 400 million of harvested web documents using the File Information Tool Kit (FITS) from Harvard University. I presented the technique and the technical metadata and basically concluded that FITS didn’t fit that kind of heterogenic data in such large amounts. In the time that has passed since that experiment, FITS has been improved in several areas including the code base and organisation of the development and it could be interesting to see how far it has evolved for big data. Still, FITS is not what I will be writing on today. Today I’ll present how we characterised more than 250 million web documents, not in 9 months, but during a weekend.

Using Kanban at the SCAPE Developer Workshop

The SCAPE project is into its final 6 months and with that came our final developer workshop. The main focus of this event was demonstrations, productisation and sustainability, however with everyone together it provided an opportune time to make progress with other SCAPE related activities.

Breaking down walls in digital preservation (part 2)

Here is part 2 of the digital preservation seminar which identified ways to break down walls between research & development and daily operations in libraries and archives (continued from Breaking down walls in digital preservation, part 1). The seminar was organised by SCAPE and the Open Planets Foundation in The Hague on 2 April 2014.