I spent last week at the Open Planets Foundation Database Archiving event at the Statens Arkvier in Copenhagen. There was a good mix of people, both technical and practicioners, with some informative presentations, and it was interesting to see the different approaches to the problem.
To my mind there appeared to be two distinct classes of use case:
Preserve the database and the environment allowing authentic access to the information and any accompanying applications.
In this case the requirement is to preserve an entire system, i.e.
This case appears to cry out for an emulation solution, but there are potential problems:
The second case is really a subset of the first:
Extract / migrate the raw data and table structure from the original database.
While this approach is pragmatic the main drawback lies in what isn't extracted. There will be information that can't be migrated to a a new database, I can't think of a single Stored Procedure dialect that doesn't break ANSI SQL in multiple ways.
Any attempt to preserve a complex system using emulation will require a lot of bespoke work per system preserved. It's difficult to see how much of the work involved can be generalised
The serialisation approach is more widely used. There were several demonstrations of serialisation tools given at the event. The problem is that none of the systems demonstrated used a common serialisation format. The SIARD format was used by at least two of the tools demonstrated, but in some cases the implementers had adapted the format to suit their own needs. This negates the advantage of using a common format, i.e. interoperability.
If I could pick a common format, I'd choose SIARD, it's been in use for a few years and the format is to be submitted as a candidate ISO standard. I'll have more to say about SIARD in a follow up post next week.