User story: archiving a FoxPro database

User story: archiving a FoxPro database

As a first encounter with database archiving, FoxPro is a nice challenge. This is the story of our first database archiving attempt.


The archive institution received a PC as part of a journalist’s archive and amongst other things there was a custom-made database application installed on it. While the PC was still working, it was pretty old and looked like it could die any moment. A Ghost image was created A.S.A.P. but the image was created with a version of Norton Ghost that is not recent enough to convert it in a virtual machine, but we can explore the image. The custom application is created with FoxPro, but the application’s installation program is no longer available. Any attempt we made to try to get it to run on another (virtual) machine failed. We were clearly missing some libraries in obscure places or some registry entries. Given that it is so difficult to get the application to run it was quickly clear that we needed to convert the database into another format for archiving and presentation purposes.

Our first attempt was to get the database loaded into Access, but that was only a partial success. The table relations were nicely recovered and most of the data got transfered nicely into Access but we had issues with character set and datatype conversions. The process requires the installation of a FoxPro ODBC driver which may be a challenge on its own when working in a 64-bit Windows environment. If you are a happy owner of a recent machine with a Windows 64-bit OS and want to convert older databases we strongly suggest you get a virtual machine running 32-bit Windows XP. It will make your conversion life much easier. Windows 7 Professional, Enterprise and Ultimate users can download a XP virtual machine for free at http://www.microsoft.com/windows/virtual-pc/download.aspx. If you decide to live dangerously and go with your 64-bit Windows, here’s a tip: the ODBC driver is 32-bit and is not visible in the ODBC Administration tool. Start C:\Windows\SysWOW64\odbcad32.exe instead. Good luck!

In the mean time we felt disappointed with the FoxPro to Access conversion. It’s hard to setup on a recent PC and the results we got are not satisfactory. The database contained some memo fields. Some of them are pretty large and contained diacritical characters. During inspection of some of the converted data we suspected parts of it were lost. But the main problem we faced was the number of steps we needed to take to get the job done because our goal is to automate the ingest procedures as much as possible. It is also not possible to get a less digital experienced user at the archiving institution to repeat our steps without a lot of help.

At the OPF Dabase Archiving Hackathon we learned that MS SQL server would make things much easier as it has support for importing the data built-in. We were very happy that our case was used for one of the breakout sessions. Installing MS SQL server is not always that easy and we only managed to get the data into SQL Server by installing a ODBC/OLEDB driver for FoxPro. Expect to have similar problems on 64-bit Windows as with Access. After migrating most of the issues we had with the Access conversion also applied to the SQL Server scenario. Thanks to the collegues in the breakout session we did get a result very fast though, even if it was to conclude that converting to the MS SQL Server did not change much. Apparently the FoxPro memo data is the source of most of our problems.

So we took a different approach. FoxPro storage is basically very close to the dBASE and DB2 storage formats and there are many tools out there that to a good job in converting dBASE and/or DB2. We tried converting the database using MIXED and allthough it is supposed to support FoxPro, it did not work. The differences in the main dbf format is well documented and seemed to match with the files of our database perfectly. We looked at the memo data files and the matched the documented format as well. The format documentation is somewhat hidden in the MSDN knowledge base. Here is the link to the latest version: http://msdn.microsoft.com/en-US/library/d863bcf2(v=vs.80).aspx. Be sure to read the updates in the comments as there seem to be errors in the documentation. The DBF file format is best explained on http://www.dbf2002.com/dbf-file-format.html. For the other file formats, the MSDN documentation is your only source.

We will be looking into open source dBASE file readers/converters like the MIXED drivers and adapt/upgrade them to handle FoxPro databases like ours. The next step will be to convert the database into an XML format (SIARD – DBML – anything that is standardized). The XML package will need to be documented and we will be looking for an application that can add the metadata to the package and make a complete SIP package ready to be ingested. The last step will be resurrection of the database into a life database for consultation purposess, depending on the requirements of the archiving institution. Still a long way to go, but at least we made significant progress during the Hackathon.

We learned that converting old databases on a 64-bit Windows platform is a bad idea and using other DB formats as intermediate step is only feasible if your old format is well supported and does not contain any complex data or large text/binary blobs. For the more complex cases we suggest to look into drivers or tools that can read the data and convert to the archival format directly or adapt/create them to make them work.

7 Comments

  1. badwanpk
    July 16, 2013 @ 2:32 pm CEST

    Just to raise another point:

    Of course technological access to a database is the primary issue. However, the problem does not end here. An archived database may not be understandable and usable because of the complexity of structure and lack of knowledge about the processess in which the data was generated. Sometimes a naive approach is used to add new functionalities (after some time there were first developed) in database systems and the standard relational modeling is not followed. This creates lots of issues in understanding the data in the future. Sometimes there are so many data quality issues in databases which create problems in the meaningful use of the data. The data quality issues may come from no data validation in the application component before inserting new records or they may come from no implementation of database constraints.

    One proposal to address these issues is the model migration approach. The approach proposes to migrate a database to a dimensional model in the preservation procedure.

  2. badwanpk
    July 17, 2013 @ 12:24 pm CEST

    Hello Euan,

    Well the approach basically involves migration of a relational database to a set of different stars. Each star (in general) corresponds to a business process. Adding some metadata about the business process would add more context to the database and make it easily understandable. So, it may be called as 'preserving a database at the business process level'.

    Currently, we are working on defining some transformation rules for carrying out the model migration. The overall goal of the rules is to transfer as much as possible of the data to the target dimensional model. However, there may be parts of data which are only required in the data capture phase of the database and they may not be important for preservation. The loss may be actually not a loss but discarding some data after an analysis and consultation with the producer.

    Thanks for refering to the blog.. it is an interesting discussion going on over there.

    Arif

Leave a Reply

Join the conversation