The SCAPE project is developing solutions to enable the processing of very large data sets with a focus on long-term preservation. One of the application areas is web archiving where long-term preservation is of direct relevance for different task areas, like harvesting, storage, and access.
First things first. The Github repository with the Audio QA workflows is here: https://github.com/statsbiblioteket/scape-audio-qa. And version 1 is working. Version is really all wrong here. I should call it Workflow 1, which is this one:
This event will focus on the issues that managers face when implementing digital preservation in their organisation. It will explore the tension between stable business processes and the introduction of new technologies. Many managers have a responsibility for digital preservation but they are not necessarily a technical expert in the field.
- Meet peers who are managing digital preservation
- Learn about the approach of others who are embedding digital preservation in business practices
- Hear about strategic approaches and policies in the field of digital preservation
- Meet experts in digital preservation
- Find out about research and development project developments
Who should attend?
Managers with a responsibility for digital preservation in large or small organisations
One of my first blogs here covered an evaluation of a number of format identification tools. One of the more surprising results of that work was that out of the five tools that were tested, no less than four of them (FITS, DROID, Fido and JHOVE2) failed to even run when executed with their associated launcher script. In many cases the Windows launcher scripts (batch files) only worked when executed from the installation folder. Apart from making things unnecessarily difficult for the user, this also completely flies in the face of all existing conventions on command-line interface design. Around the time of this work (summer 2011) I had been in contact with the developers of all the evaluated tools, and until last week I thought those issues were a thing of the past. Well, was I wrong!
Anyone willing to preserve digital content must be aware of events that might constitute a relevant risk. In SCAPE we are developing tools that will allow you to detect risks before they cause any irreversible damage.
Help us understand the preservation events, threats and opportunities, you find more relevant and the ways you would like us to detect them.
Participate in our survey and help us develop tools that would help you to automatically detect problems in your own content, and events that might put it at risk.
The survey has 30 short questions that should take about 10 minutes to complete.
This blog follows up on three earlier posts about detecting preservation risks in PDF files. In part 1 I explored to what extent the Preflight component of the Apache PDFBox library can be used to detect specific preservation risks in PDF documents. This was followed up by some work during the SPRUCE Hackathon in Leeds, which is covered by this blog post by Peter Cliff. Then last summer I did a series of additional tests using files from the Adobe Acrobat Engineering website. The main outcome of this more recent work was that, although showing great promise, Preflight was struggling with many more complex PDFs. Fast-forward another six months and, thanks to the excellent response of the Preflight developers to our bug reports, the most serious of these problems are now largely solved1. So, time to move on to the next step!