Migration

Quattro Pro for DOS: an obsolete format at last?

While browsing ArchiveTeam’s File Formats Wiki earlier this week, I came across some entries I created there on Quattro Pro spreadsheets two years ago. At the time I had also contributed some old Quattro Pro for DOS spreadsheets (here and here) from my personal archives to the OPF format corpus. Seeing those files again, I decided to spend an afternoon trying to access them using modern-day software. This turned out to be more challenging than expected. It even made me wonder whether, at long last, I had finally run into a case of the much discussed (but rarely observed) phenomenon of format obsolescence. Yes, big words indeed, and if anyone would like to prove me wrong, the comments section below is your friend!

Webinar: Long-term preservation with dArceo

Overview:

Multiple cultural heritage institutions in Poland are involved in digitization activities, building together a network of over 100 Polish digital libraries, and making over 2 million of digital objects accessible on-line (http://fbc.pionier.net.pl/). Since 1999 PSNC R&D centre (located in Poznan, Poland) has been supporting these activities with innovative technologies related to digitisation, metadata aggregation or ground truth production. One of the results, strictly related to digitisation activities, is a “Digitise and Go” (DInGO, http://dingo.psnc.pl/) toolset composed of on-line access systems (dLibra or dMuseion), digitisation workflow management system (dLab) and long-term preservation system (dArceo).

This webinar will give an overview of the DInGO toolset with a special focus on dArceo long-term preservation tool, including its main features, general architecture and production-mode case studies. The ultimate goal of dArceo is to assure long-term preservation of master files using OAIS transformation approach (migration). dArceo runs in a service mode manner to provide simple way for integration with external systems (e.g. digitisation workflow management). To preserve data dAarceo uses many already available preservation tools (e.g. FITS, DROID, UDFR, ffmpeg, JHOVE) and well-known formats (e.g. METS, PREMIS, MIX, AES57, TextMD). There are already several production mode deployments of dArceo in Poland, including very small institutions, universities as well as large consortia. dArceo code-base was initiated within the framework of a national project funded by the Polish National Centre for Research and Development. PSNC is currently in the process of releasing dArceo open-souce.

Session Lead: Tomasz Parkola, Poznan Supercomputing and Networking Centre

Duration: 1 hour

Time: 11:00 GMT / 12:00noon CET

There are 23 places available on a first come, first serve basis. Priority registration will be given to OPF members.

Registration will open soon.

Date: 
4 December 2014
Event Types: 

Six ways to decode a lossy JP2

Some time ago Will Palmer, Peter May and Peter Cliff of the British Library published a really interesting paper that investigated three different JPEG 2000 codecs, and their effects on image quality in response to lossy compression. Most remarkably, their analysis revealed differences not only in the way these codecs encode (compress) an image, but also in the decoding phase. In other words: reading the same lossy JP2 produced different results depending on which implementation was used to decode it.

A limitation of the paper’s methodology is that it obscures the individual effects of the encoding and decoding components, since both are essentially lumped in the analysis. Thus, it’s not clear how much of the observed degradation in image quality is caused by the compression, and how much by the decoding. This made me wonder how similar the decode results of different codecs really are.

Weirder than old: The CP/M File System and Legacy Disk Extracts for New Zealand’s Department of Conservation

We’ve been doing legacy disk extracts at Archives New Zealand for a number of years with much of the effort enabling us to do this work being done by colleague Mick Crouch, and former Archives New Zealand colleague Euan Cochrane – earlier this year, we received some disks from New Zealand’s Department of Conservation (DoC) which we successfully imaged and extracted what was needed by the department. While it was a pretty straightforward exercise, there was enough about it that was cool enough to warrant that this blog be an opportunity to document another facet of the digital preservation work we’re doing, especially in the spirit of being another war story that other’s in the community can refer to. We do conclude with a few thoughts about where we still relied on a little luck, and we’ll have to keep that in mind moving forward.

When (not) to migrate a PDF to PDF/A

It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating these risks. However, the benefits of this approach are often questionable, and the migration process can also be quite risky in itself. As I often get questions on this subject, I thought it might be worthwhile to do a short write-up on this.

BSDIFF: Technological Solutions for Reversible Pre-conditioning of Complex Binary Objects

Documented provenance and the ability for researchers to locate and view original versions of digital records as transferred into an archive are concepts central to archival theory. The continuing ability to enable this is challenged by the numbers of digital records we’re facing; a requirement to follow good digital preservation practice; a need to provide access; the complexity of modern file formats; and the cost of doing all of it. Technological solutions, and techniques borrowed from other disciplines can help reduce costs throughout the transfer process through to the maintenance of digital objects in a digital repository without compromising the integrity asked by archival theory. Using binary diffs and binary patching mechanisms is one such solution that can reduce costs and provide a sound method of documenting all file modifications, from the trivial to the complex – enabling the original record to always be recovered.

Will the real lazy pig please scale up: quality assured large scale image migration

Authors: Martin Schaller, Sven Schlarb, and Kristin Dill

In the SCAPE Project, the memory institutions are working on practical application scenarios for the tools and solutions developed within the project. One of these application scenarios is the migration of a large image collection from one format to another.

Preserving PDF: identify, validate, repair

Overview
This event will focus on the PDF file format. Participants are encouraged to contribute requirements, for instance sample files with errors or anomalies for investigation. Currently available identification and validation tools will be demonstrated, with the opportunity to compare results using your own collections and identify gaps for future development.
OPF members have identified specific tasks for the event:
  • check the validity of the files and whether they are encrypted;
  • perform quality assurance checks after migration, using comparison tools;
  • investigate error messages, repair the problems, and build a knowledge base; and
  • document and improve open source tool functionality e.g. JHOVE validation.
There will also be discussion sessions, and the opportunity to share experiences with peer organisations.
Olaf Drümmer, Chairman of the PDF Association / CEO of callas software GmbH / DIN delegate to all PDF related working groups in ISO TC 171 and ISO TC 130 since 1999, will present the work of the ISO standards body, including efforts related to PDF and PDF/A, and share the industry perspective on tool development.
Why attend?
  • Learn about PDF and PDF/A standards
  • Document and prioritise known preservation problems with PDF files
  • Assess state of the art identification and validation tools
  • Test the tools on sample files and compare the results
  • Define organisational requirements and policies for conformance
  • Identify requirements for future development work (road-mapping)
  • Help improve current PDF tools (hacking)
Who should attend?
Collection owners with a responsibility to preserve PDFs. Bring along your problem files!
Developers interested in hacking PDF identification and validation tools.
Agenda
Registration
OPF members are invited free-of-charge (please use the code issued to your main point of contact at your organisation). Non-members are welcome at the rate of EUR 150.
Register at: https://www.eventbrite.co.uk/e/preserving-pdf-identify-validate-repair-registration-12203790867
Date: 
1 September 2014 to 2 September 2014
Event Types: