SCAPE

SCAPE/OPF Continuous Integration update

As previously blogged about by Carl we now have virtually all SCAPE and OPF projects in Continuous Integration; building and unit testing in both Travis CI and Jenkins.

  • Travis compiles the projects and executes unit tests whenever a new commit is pushed to Github, or when a pull request is submitted to the project.
  • Jenkins builds are generally scheduled once per day. After a build the software has its code quality analysed by Sonar

Scalable Environments for File Format Identification and Characterisation

This webinar provides an introduction to file format identification and characterisation tools which have been developed or extended as part of the SCAPE Project.

It covers the basic principals of file format identification, and shows how format information drives digital preservation workflows.

Participants will be given an overview of file format registries, and their role in digital preservation, and will see demonstrations of identification and characterisation tools including fido and tika.

We will provide a Virtual Machine image with samples files and step-by-step worksheets to allow participants to try out these exercises for themselves after the webinar with support.

Learning outcomes (by the end of the webinar and exercises, participants
will be able to):

  • Distinguish between different file types and identify the requirements for characterising each of them.
  • Carry out identification and characterisation experiments on example files.
  • Compare characterisation and identification tools and understand their advantages and disadvantages when used in different scenarios.


Session Lead: Carl Wilson, OPF
Date: Friday 25 October
Time: 12 noon BST / 13:00 CET
Duration: 1 hour (please note this includes the presentation and demonstrations. Practical exercises can be carried out after the webinar).

There are 25 places available which will be allocated on a first come, first serve basis.

Date: 
25 October 2013
Event Types: 

Let’s benchmark our Hadoop clusters (join in!)

Introduction

For our evaluations within SCAPE it would be useful to have the ability to quantitatively measure the abilities of the Hadoop clusters available to us, to allow results from each cluster to be compared.

Fortunately as part of the standard Hadoop distribution there are some examples included that can be run as tests. Intel has produced a benchmarking suite – HiBench – that uses those included Hadoop examples to produce a set of results.

Interview with a SCAPEr – Rui Castro

Rui Castro

Who are you?

I’m Rui Castro. I work at KEEP SOLUTIONS since 2010 where I have the roles of Director of Infrastructures, project manager and researcher. Before joining KEEP SOLUTIONS, I was part of the team who developed RODA, the digital preservation repository used by the Portuguese National Archives.

Identification of PDF preservation risks: the sequel