Will the real lazy pig please scale up: quality assured large scale image migration

Authors: Martin Schaller, Sven Schlarb, and Kristin Dill

In the SCAPE Project, the memory institutions are working on practical application scenarios for the tools and solutions developed within the project. One of these application scenarios is the migration of a large image collection from one format to another.

An Analysis Engine for the DROID CSV Export

I have been working on some code to ensure the accurate and consistent output of any file format analysis based on the DROID CSV export. The tool produces summary information about any DROID export and more detailed listings for content of interest such as files with potentially problematic file names or duplicate content based on MD5 hash value. I describe some of the rationale and ask for advice on where to go next.

A Weekend With Nanite

Well over a year ago I wrote the ”A Year of FITS”( blog post describing how we, during the course of 15 months, characterised 400 million of harvested web documents using the File Information Tool Kit (FITS) from Harvard University. I presented the technique and the technical metadata and basically concluded that FITS didn’t fit that kind of heterogenic data in such large amounts. In the time that has passed since that experiment, FITS has been improved in several areas including the code base and organisation of the development and it could be interesting to see how far it has evolved for big data. Still, FITS is not what I will be writing on today. Today I’ll present how we characterised more than 250 million web documents, not in 9 months, but during a weekend.