SCAPE

Identifying ebooks (File ID Hackathon)

Several of us at The British Library took part in the CURATEcamp file id hackathon on Friday.

We decided that one issue we could make a useful impact on was identification of various ebook formats. eBooks are an important content type for the British Library, especially with the expected implementation of non-print legal deposit legislation next year. For a long list of formats look here: http://wiki.mobileread.com/wiki/E-book_formats

C3PO: a content profiling tool for preservation analysis

peshkira's picture

In the last months, I have been researching the problem of large-scale content profiling for preservation analysis. I do this for a number of reasons. For one, I support the opinion that formats are just another property. Undoubtedly, a very important one, but knowing which formats you have is not sufficient for good preservation planning and actions.

Watching Stars

I'm not an astronomer but if I were I'd probably get excited watching the birth of a star. What I do get excited about is being around to watch the creation and evolution of a digital preservation problem right here and now.

Experimentation, simulation and prediction in DP: Emerging challenges

cbecker's picture

As many of you may know, Cal Lee, Andi Rauber and myself recently attempted to facilitate a broad discussion on emerging research challenges within the DP community at a

Building a digital preservation toolkit for digital curators

In the context of digital information, many curation tasks need to be performed to ensure continuous access to information. As digital assets grow in size and number, tools must be deployed to ease on the execution of common digital preservation tasks and therefore making the whole digital preservation process more manageable.

Running Apache Tika over ARC files using Apache Hadoop

In the context of the SCAPE project, we have recently been doing a series of experiments associated with content file identification of ARC.GZ web archive containers. Why? Because you will presumably be interested in which different file formats you have in your archive containers and how many of them per type.

The practitioners have spoken: "We need better characterisation!"

paul's picture

On Monday I was asked to speak at an experts workshop aimed at steering developments in preservation services on the Reponet+ Project (part of JISC Innovation Zone).

File Identification using Fido and the UDFR Registry

pmay's picture

Task:

I primarily wanted to get an understanding of SPARQL queries and how they can be used to query linked data. As a focus for my work, I set myself a challenge to get Fido working using signatures from the UDFR registry.

SCAPE Project Training Event: Keeping Control - Scalable Preservation Environments for Identification and Characterisation



One of the biggest initial challenges to digital preservation is file format identification. While there has been a lot of work in this area, the ever changing nature of digital formats realistically means the problem will never be "solved". This first SCAPE training course will give you the knowledge and experience to confidently choose file formation identification and characterisation tools, which have been developed or extended during the SCAPE project. 

Date: 
6 December 2012 to 7 December 2012
Location: 
Archaeological Museum of the Martins Sarmento Society
Rua Paio Galvão
Guimarães
Portugal
Event Types: 

Automated assessment of JP2 against a technical profile

johan's picture

I've already written a number of blog posts on format validation of JP2 files. Format validation is only a one aspect of a quality assessment workflow. Digitisation guidelines typically impose various constraints on the technical characteristics of preservation and access images. For example, they may state that a preservation master must be losslessly compressed, and that its progression order must be RPCL. A format profile is a set of such technical constraints.

Pages