Identification

Scalable Environments for File Format Identification and Characterisation

This webinar provides an introduction to file format identification and characterisation tools which have been developed or extended as part of the SCAPE Project.

It covers the basic principals of file format identification, and shows how format information drives digital preservation workflows.

Participants will be given an overview of file format registries, and their role in digital preservation, and will see demonstrations of identification and characterisation tools including fido and tika.

We will provide a Virtual Machine image with samples files and step-by-step worksheets to allow participants to try out these exercises for themselves after the webinar with support.

Learning outcomes (by the end of the webinar and exercises, participants
will be able to):

  • Distinguish between different file types and identify the requirements for characterising each of them.
  • Carry out identification and characterisation experiments on example files.
  • Compare characterisation and identification tools and understand their advantages and disadvantages when used in different scenarios.


Session Lead: Carl Wilson, OPF
Date: Friday 25 October
Time: 12 noon BST / 13:00 CET
Duration: 1 hour (please note this includes the presentation and demonstrations. Practical exercises can be carried out after the webinar).

There are 25 places available which will be allocated on a first come, first serve basis.

Date: 
25 October 2013
Event Types: 

FIDO News

Here’s a little newsbulletin about FIDO, the open source file format identification tool of OPF.

It seems that the use of FIDO is growing the last few months. I am getting responses by e-mail and through the Github issuetracker from all over the world, ranging from requests for help, giving suggestions for improvement and even some bugfixes. Thanks and please keep them coming!

Preservation capabilities: How to assess? How to improve?

Digital Preservation is making certain progress in terms of tool development, progressive establishment of standards and increasing activity in user communities, but there is a wide gap of approaches to systematically assess, compare and improve how organizations go about achieving their preservation goals.

Droid file format identification using Hadoop

The DROID software tool is developed by The National Archives (UK) to perform automated batch identification of file formats by assigning Pronom Unique Identifiers (PUIDs) and MIME types to files. The tool uses so called signature files as a basis of information stemming from the PRONOM technical registry.

I am here presenting some considerations for using the tool on the Hadoop platform together with a performance evaluation of the job execution on a Hadoop cluster using the publicly available Govdocs1 corpus data set.

Webinar – C3PO, an introduction to content profiling

This webinar will focus on content profiling and preservation planning. It aims to address the following questions:

  • Why do we need identification and characterisation?
  • How can we use the metadata that these processes provide?

It will cover some of the tools that are available and there will be a demonstration of C3PO (Clever, Crafty, Content Profiling of Objects) tool, and an explanation of how you can analyse the metadata it produces.

Session Lead: Petar Petrov, Creative Pragmatics

Time: 13:00 BST / 14:00 CET

Register at: http://opf-c3po.eventbrite.co.uk/

Date: 
31 May 2013
Event Types: 

We don’t do migration for the future; we do it for the present: Emulation and an ever so slightly unsatisfying success story

A reminder about obsolescence at Archives New Zealand. Re-evaluating emulation and migration and a digital preservation report card for a file format that simply says: Could. Do. Better.

OPF Hackathon: Tackling Real-World Collection Challenges with Digital Forensics Tools and Methods

This Hackathon will provide participants with a chance to learn about what digital forensics has to offer to the digital preservation community by getting hands on with data, tools and new techniques. Our expert facilitators will be on hand to guide you through these challenging concepts and processes in a manner that is easy to understand and apply.

The agenda for the event is set by you: the collection owners. You will bring with you disk images of digital data you want to assess, analyze and preserve. As part of a team, and with support from experts, you’ll apply forensics tools to solve your digital preservation challenges.

The technical problem solving and software development will be delivered by you: the developers. You will bring with you your laptop-based development setup and your best coding skills. Working closely with other developers, and our experts, you’ll apply forensics and other software tools to solve the digital preservation challenges provided by the practitioners.

For further information about the event, including the agenda, how to prepare for the event, and advice on travel and accommodation, please visit our event wiki page: http://wiki.opf-labs.org/display/KB/2013-06-03+OPF+Hackathon+-+Tackling+Real-World+Collection+Challenges+with+Digital+Forensics+Tools+and+Methods+%28Chapel+Hill%29.

Date: 
3 June 2013 to 5 June 2013
Event Types: