This blog post is an answer to willp-bl's post "Mixing Hadoop and Taverna" and is building on some of the ideas that I presented in my blog post "Big data processing: chaining Hadoop jobs using Taverna".
In 2002 the UK government introduced regulation that required all UK local authorities to provide the British Library with a copy of the electoral register every year. However, the legislation did not require this data to be provided in any particular format and, as a result the data is sent to the British Library in a variety of digital formats.
We are pleased to welcome two new affiliate organisations to our membership: Portico, a digital preservation service, and the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill, a number one ranked school in the United States.
If you are, what are you retaining and why?
If not, why not?
There is more to come from us on this topic - but for now I'd love hear any opinions / thoughts.
And what do I mean by technical provenance?
Good question. I mean any filename sanitation, or QA changes to (meta)data, or any file structure moves, or normalisation data or details of any technical process that has touched the original bitstream as it was found (at rest, if applicable) on its source medium.