davetaz's blog | Open Planets Foundation

Years on from the registry, why has the preservation community not solved the problem of well managed and high quality data publication?

By davetaz on 29 August 2012 – 1:47pm

While open data sources, such as PRONOM, Software Conversion Registry (CSR) and govdocs are excellent examples example of publishing re-usable data (to some extent) there is still a big problem with gaining access to other sources of data.

From 1 Million to 21,000: Reducing Govdocs Significantly

By davetaz on 26 July 2012 – 11:31am

As part of the evaluation framework i’m developing for OPF and Scape I’ve been working on gathering a corpora of files to run experiments against.

Although Govdocs1 would seem like a good place to start there are a few problems:

1) It’s too big, 1 Million Files is just showing off.

2) It’s full of repeats! There are over 700,000 PDF files.

3) Running experiments on 1 Million files that is full of repeats generates too much data (yes there is such a thing)

Scalable Data Preservation

By davetaz on 3 July 2012 – 3:27pm

Currently the scientific and R&D communities are continuously talking about data and dataset collection and reuse. Core to these aspects is archiving and preserving this content.

Sharing data is key for many reasons:

Discoving new science
Re-producing results
Verifying research
Evidence based decision making
Establishing trust

Turning GitHub Code into Debian Packages – The OPF Way

By davetaz on 8 March 2012 – 2:13pm

In order to make the genreation of debian pacakges easy, OPF has created and paying to host a number of Amazon AMIs which can be lauched by anyone. These AMIs are already set up to build the package automatically and their only function is to download the latest release (by tag number), build it and put it on the servers web page such that you can download it.

Summary of Outputs and Roadmap (Feb 2012)

By davetaz on 20 February 2012 – 2:08pm

Since joining the project in July 2011 I have focussed on aligning a number of different groups and outputs to be consistant and maintainable into the future. In this way I feel my role is not only to support OPF but to use it as a platform to support the on going digital preservation targets of others outside of the immediate OPF and SCAPE project comminuties.

Search form

You are here

davetaz’s blog