Planet DigiPres

Statement by G8 ministers on international issues that require global cooperation

Alliance for Permanent Access News - 14 June 2013 - 9:54am

From Juan Bicarregui (STFC):

The link below leads to this week’s statement by G8 ministers on “international issues that require global cooperation”.

https://www.gov.uk/government/news/g8-science-ministers-meet-in-london

The four topics discussed were:

  • global challenges
  • global research infrastructure
  • open scientific research data
  • increasing access to the peer-reviewed, published results of scientific research

The third and fourth topics will be of particular interest to APA and APARSEN members.

Categories: Planet DigiPres

Statement by G8 ministers on international issues that require global cooperation

Alliance for Permanent Access News - 14 June 2013 - 9:54am

From Juan Bicarregui (STFC):

The link below leads to this week’s statement by G8 ministers on “international issues that require global cooperation”.

https://www.gov.uk/government/news/g8-science-ministers-meet-in-london

The four topics discussed were:

  • global challenges
  • global research infrastructure
  • open scientific research data
  • increasing access to the peer-reviewed, published results of scientific research

The third and fourth topics will be of particular interest to APA and APARSEN members.

Categories: Planet DigiPres

APARSEN @ iPRES 2013

Alliance for Permanent Access News - 11 June 2013 - 11:28am

Interoperability of Persistent Identifiers Systems – services across PI domains
Date: Thursday, 5th September 2013, afternoon
Location: IST – Instituto Superior Técnico in Lisbon, Portugal

Programme Committee:
Maurizio Lunghi e Emanuele Bellini (Fondazione Rinascimento Digitale/APARSEN), Renè Van Horik (DANS), Barbara Bazzanella e Paolo Bouquet (UNITN), Bas Cordewener (Knowledge Exchange), Anila Angjeli (ISNI), Giovanni Bergamin (Central National Library in Florence), Norman Pasquin (LCC and DOI foundation), John Kunze (California Digital Library), Tobias Weigel (RDA), Antoine Isaac (Europeana), (NN, EUDAT)

Workshop Goal:
The central goal of this second edition of the workshop on Interoperability of  Persistent Identifier Systems (www.rinascimento-digitale.it/workshopPI2012) is to bring together representatives from different PI communities to discuss potential benefits for final users as well as challenges requirements and technologies to implement an effective interoperability solution for different PI systems and related services. Supporters of this workshop proposal and the experts in the programme committee represents large and significant PI user communities, other experts are
pending.
A first section is devoted to users and to potential services and benefits for final users that could be built on such an interoperability framework. Participants are involved in the description of future user scenarios and potential applications of the PI systems, making evident user benefits and requirements.
A second section is focused on technical aspects regarding the implementation of an interoperability solution and related services. As a starting point for the technical discussion, the new Interoperability Framework (IF) for PI systems, proposed by the APARSEN project and refined by a large group of independent experts is described and a demonstrator is presented. The IF model is suitable to all the different user requirements and is adoptable by all PI user communities.
Participants are invited to compare their requirements with the IF features and assumptions confronting on various aspects of the model, potential benefits and concrete terms for a common roadmap for the implementation of the framework in order to create consensus on to develop joint cross-domain applications.
Representatives of the most relevant PI initiatives and different PI user communities are invited to report on current activities and vision, but also on possible approaches to define interoperability solutions and services and expose their position towards needs and opportunities of moving toward the implementation of a comprehensive interoperability technological solution for PI systems.

Joint APARSEN/4C workshop:
What does it cost? – EU Activities to Assess the Cost of Digital Curation

Date: Thursday, 5th September 2013, afternoon
Location: IST – Instituto Superior Técnico in Lisbon, Portugal

Subject of the Workshop:
Digital preservation and curation activities tend to be costly and complex and require a long term commitment. Without reliable cost information and accurate cost models, it is hard to plan and calculate such activities seriously.
A number of digital curation cost models have been developed in the last years, and initiatives like Knowledge Exchange, the Digital Curation Centre, and the Blue Ribbon Task Force for Economically Sustainable Digital Curation have, among others, looked at cost and benefits of keeping digital data available for the long term. Most recently, the APARSEN project has provided a high level analysis of published cost models, it has reviewed cost parameters in relation to the trusted digital repositories standard, ISO 16363, and it has investigated the level of preparedness of research libraries to ensure economically-sustainable digital preservation.
The new EU project 4C – ‘the Collaboration to Clarify the Costs of Curation’ – draws all of these initiatives and their results together. It networks existing projects and initiatives and will ensure that where existing work is relevant, stakeholders realize it and understand how to employ it. A key aim for this workshop is to build bridges between ongoing costs-related initiatives to enable 4C to identify areas where good progress has been made and also to understand how current cost models might be augmented to improve ease of use and increase uptake. Ultimately, 4C will help organizations to approach their investment in data curation and preservation with greater certainty and with greater clarity about what they will get back in return. The project partners will use the workshop as an opportunity to set the scene for their topic, present their approach (“assess, enhance, engage”) and invite feedback of workshop participants.

A key point for the open discussion session will be to identify difficulties that ongoing costs-related initiatives have had in collecting cost information and encouraging use of their models. During the session we will invite input from these initiatives into how 4C might help to overcome these difficulties to realise increased uptake of the models and ultimately an improved understanding of curation costs.

Categories: Planet DigiPres

APARSEN @ iPRES 2013

Alliance for Permanent Access News - 11 June 2013 - 11:28am

Interoperability of Persistent Identifiers Systems – services across PI domains
Date: Thursday, 5th September 2013, afternoon
Location: IST – Instituto Superior Técnico in Lisbon, Portugal

Programme Committee:
Maurizio Lunghi e Emanuele Bellini (Fondazione Rinascimento Digitale/APARSEN), Renè Van Horik (DANS), Barbara Bazzanella e Paolo Bouquet (UNITN), Bas Cordewener (Knowledge Exchange), Anila Angjeli (ISNI), Giovanni Bergamin (Central National Library in Florence), Norman Pasquin (LCC and DOI foundation), John Kunze (California Digital Library), Tobias Weigel (RDA), Antoine Isaac (Europeana), (NN, EUDAT)

Workshop Goal:
The central goal of this second edition of the workshop on Interoperability of  Persistent Identifier Systems (www.rinascimento-digitale.it/workshopPI2012) is to bring together representatives from different PI communities to discuss potential benefits for final users as well as challenges requirements and technologies to implement an effective interoperability solution for different PI systems and related services. Supporters of this workshop proposal and the experts in the programme committee represents large and significant PI user communities, other experts are
pending.
A first section is devoted to users and to potential services and benefits for final users that could be built on such an interoperability framework. Participants are involved in the description of future user scenarios and potential applications of the PI systems, making evident user benefits and requirements.
A second section is focused on technical aspects regarding the implementation of an interoperability solution and related services. As a starting point for the technical discussion, the new Interoperability Framework (IF) for PI systems, proposed by the APARSEN project and refined by a large group of independent experts is described and a demonstrator is presented. The IF model is suitable to all the different user requirements and is adoptable by all PI user communities.
Participants are invited to compare their requirements with the IF features and assumptions confronting on various aspects of the model, potential benefits and concrete terms for a common roadmap for the implementation of the framework in order to create consensus on to develop joint cross-domain applications.
Representatives of the most relevant PI initiatives and different PI user communities are invited to report on current activities and vision, but also on possible approaches to define interoperability solutions and services and expose their position towards needs and opportunities of moving toward the implementation of a comprehensive interoperability technological solution for PI systems.

Joint APARSEN/4C workshop:
What does it cost? – EU Activities to Assess the Cost of Digital Curation

Date: Thursday, 5th September 2013, afternoon
Location: IST – Instituto Superior Técnico in Lisbon, Portugal

Subject of the Workshop:
Digital preservation and curation activities tend to be costly and complex and require a long term commitment. Without reliable cost information and accurate cost models, it is hard to plan and calculate such activities seriously.
A number of digital curation cost models have been developed in the last years, and initiatives like Knowledge Exchange, the Digital Curation Centre, and the Blue Ribbon Task Force for Economically Sustainable Digital Curation have, among others, looked at cost and benefits of keeping digital data available for the long term. Most recently, the APARSEN project has provided a high level analysis of published cost models, it has reviewed cost parameters in relation to the trusted digital repositories standard, ISO 16363, and it has investigated the level of preparedness of research libraries to ensure economically-sustainable digital preservation.
The new EU project 4C – ‘the Collaboration to Clarify the Costs of Curation’ – draws all of these initiatives and their results together. It networks existing projects and initiatives and will ensure that where existing work is relevant, stakeholders realize it and understand how to employ it. A key aim for this workshop is to build bridges between ongoing costs-related initiatives to enable 4C to identify areas where good progress has been made and also to understand how current cost models might be augmented to improve ease of use and increase uptake. Ultimately, 4C will help organizations to approach their investment in data curation and preservation with greater certainty and with greater clarity about what they will get back in return. The project partners will use the workshop as an opportunity to set the scene for their topic, present their approach (“assess, enhance, engage”) and invite feedback of workshop participants.

A key point for the open discussion session will be to identify difficulties that ongoing costs-related initiatives have had in collecting cost information and encouraging use of their models. During the session we will invite input from these initiatives into how 4C might help to overcome these difficulties to realise increased uptake of the models and ultimately an improved understanding of curation costs.

Categories: Planet DigiPres

APARSEN @ Metaday #59 Vienna

Alliance for Permanent Access News - 10 June 2013 - 1:38pm

Veronika Prändl-Zika, Austrian National Library, presented the APARSEN project in the course of the Metaday #59 at the Metalab in Vienna on June 7, 2013. For more details: https://metalab.at/wiki/Metaday_59

Categories: Planet DigiPres

APARSEN @ Metaday #59 Vienna

Alliance for Permanent Access News - 10 June 2013 - 1:38pm

Veronika Prändl-Zika, Austrian National Library, presented the APARSEN project in the course of the Metaday #59 at the Metalab in Vienna on June 7, 2013. For more details: https://metalab.at/wiki/Metaday_59

Categories: Planet DigiPres

Thanks, Debbie!

Files That Last - 7 June 2013 - 10:03pm

From Twitter:

Writers & illus: one backup isn’t enough. Find *excellent* tips in FILES THAT LAST by @garym03062: bit.ly/15HSXgU

— Debbie Ridpath Ohi (@inkyelbows) June 7, 2013


Categories: Planet DigiPres

FTL at Dartmouth Library

Files That Last - 6 June 2013 - 11:12am

Just for fun, here’s a video of me singing the “Files that Last” song at the Dartmouth College Library. The sound isn’t great, and it’s obvious why I never went for a career as a singer, but it was fun.


Categories: Planet DigiPres

Time to think of the fall

Files That Last - 20 May 2013 - 10:21am

It’s mid-May, and graduations are already starting. Those of you who teach know it isn’t too soon to plan for the fall’s courses. If you’re teaching a course that touches on system management, data maintenance, or preservation issues, you should consider including Files that Last on its reading list.

Preservation Services at Dartmouth College offered a reading list in digital preservation in 2012. That list, which predates FTL, suggests several books which focus on preservation from an institutional standpoint. The Planets Project (which has become the Open Planets Foundation) has an older but longer bibliography in a similar vein. Files that Last complements books like these with its focus on a broader computer audience, the people who need to do preservation as an aspect of their regular work, rather than being primarily information curators.

If your students read Files that Last, it will help them understand the issues of data preservation and loss and appreciate the importance of good data maintenance practices, and they’ll learn habits that will let them better control the data in their own lives and their future jobs.


Categories: Planet DigiPres

Introducing the update page

Files That Last - 16 May 2013 - 9:57am

I’ve launched a page of updates and errata for Files that Last, with some new information on the WebP still image format. As I learn about things that have changed or mistakes in the book, I’ll add to the page.

If you spot anything that you think needs fixing, please let me know.


Categories: Planet DigiPres

«IN/SIDE/OUT» BlogForever meets bloggers from all over the world at re:publica 2013

blog forever - 13 May 2013 - 2:46pm

Where can one personally meet bloggers from 50 countries and discuss the urgent issues of digital society in Europe? Right in the heart of Berlin, Germany.

This year, BlogForever once again presented our consortium’s efforts at re:publica, Germany´s biggest conference for social media issues and innovation (http://www.re-publica.de/). The conference took place for the seventh year in Berlin with about 5.000 visitors. 450 speakers presented workshops, discussions and talks from the 6th to the 8th of May about the latest developments in social media all over the world. In such an environment of active bloggers and fresh ideas, it was a great pleasure to be included and have the chance to discuss our project with the people whose content we hope to preserve.

It was both interesting and promising to see that the content of those discussions had changed from the previous year at re:publica. Many of our booth visitors were already informed and convinced that weblogs should be acknowledged as an important piece of our cultural heritage. Additionally, they were supportive of a more robust preservation, one that can better accommodate future research and improve accessibility for the public. The current situation in Syria and other crisis regions in Northern African countries were especially noted as underlining the necessity and importance of independent blogging and the access of such information for journalists, researchers and the public in general.
While bloggers wanted to inform themselves about ways to preserve their own blog more completely and securely, they also asked for ways to disseminate their blog more efficiently. Some individuals indicated interest in a large-scale solution, similar to the internet archive (http://archive.org) in the United States, for European weblogs. BlogForever also met with researchers at several European universities who represented communities of expert bloggers focusing on specific thematic areas, such as science or technology. Several of these individuals expressed the need for a blog preservation solution which could strengthen research networks.

We have come a long way in spreading the message about the importance of digital preservation, in particular of weblogs. The public is now ready to dream about the ways in which preservation can be valuable in the present as well as the future.

Categories: Planet DigiPres

FTL on Amazon

Files That Last - 13 May 2013 - 9:41am

Files that Last is now available as a Kindle e-book.

Smashwords was taking forever to get “technical integration” from Amazon, and when I got a query from a friend about Amazon availability, I decided to go with KDP (Kindle Direct Publishing). Amazon’s registration process isn’t more painful than you’d expect, given that they need to pay me and report my income, and the submission process gives me more control than Smashwords’ does, though it takes more work to take full advantage of it. (The best way to submit a book to KDP is as an HTML file with detailed CSS, and saving as HTML from OpenOffice gives you that. I had to make some manual changes to the CSS for a good result.) This means there are some differences in formatting between the Smashwords and KDP editions. There shouldn’t be any differences in content.

I’m not thrilled with Amazon’s commitment to DRM, closed platforms, and licensing rather than really selling e-books, but I don’t dislike them enough to cut myself off from that market. So if you’ve been holding out for the Kindle version, wait no more!


Categories: Planet DigiPres

Cinco de Mayo sale!

Files That Last - 3 May 2013 - 9:33am

Yes, it’s only tres de mayo, but Sunday is a lousy day to hold a sale. Besides, today is International Day against DRM. For today through the 5th, you can get Files that Last on Smashwords — DRM-free, of course — for the super-low price of $3.20 instead of the usual $7.99. Enter the coupon code TT58Q when buying the book to get this price. If you already have it, why not buying a copy for a friend or colleague?

This applies only to copies bought on Smashwords, not on other sites. Sorry if you prefer to buy on the iTunes store, but I’m not able to issue coupons for other sites.

Correction: Earlier I’d listed $2.99. I wasn’t able to set the price directly on Smashwords, so I had to set it as a percentage off and made it 60% off, setting it to $3.20. Apologies to anyone who was annoyed by the discrepancy.


Categories: Planet DigiPres

Hello!

Files That Last - 2 May 2013 - 11:24am

Files that Last is the first e-book on digital preservation directed at “everygeek.” In case your layout doesn’t show you the page links (e.g., on a mobile device), you can read what the book’s about and how to get it here.


Categories: Planet DigiPres

Data Management Planning tools: still immature?

Unsustainable Ideas - 26 April 2013 - 2:43pm

I’ve spent the last few months looking at the JISC data management planning projects. It’s been very interesting. Data management planning for research is still comparatively immature, and so are the tools that are available to support it. The research community needs more and better tools at a number of levels. Here are my thoughts… what do you think?

At group or institution level, we need better “maturity assessment” tools. This refers to tools like:

  • DCC CARDIO for assessing institutional readiness,
  • the DCC Digital Asset Framework for understanding the landscape of data resources,
  • repository risk assessment and quality assessment tools like DRAMBORA, Data Seal of Approval, etc
  • security assessment tools including audits based on ISO 27000.

Some of the existing tools seem rather ad hoc, as if they had emerged and developed from somewhat casual beginnings (perhaps not well put; maybe from beginnings unrelated to the scale of tasks now facing researchers and institutions). It is perhaps now time for a tool assessment process involving some of the stake-holders to help map the landscape of potential tools, and use this to plot development (or replacement) of existing tools.

For example CARDIO and DAF, I’m told, are really tools aimed at people acting in the role of consultants, helping to support a group or institutional assessment process. Perhaps if they could be adjusted to be more self-assessment-oriented, it might be helpful. The DAF resource really needs to be brought up to date and made internally consistent in its terminology.

Perhaps the greatest lack here is a group-oriented research data risk-assessment tool. This could be as simple as a guide-book and a set of spreadsheets. But going through a risk assessment process is a great way to start focusing on the real problems, the issues that could really hurt your data and potentially kill your research, or those that could really help your research and your group’s reputation.

We also need better DMP-writing tools, ie better versions of DMPonline or DMP Tool. The DCC recognises that DMPonline needs enhancement, and has written in outline about what they want to do, all of which sounds admirable. My only slight concern is that the current approach with templates for funders, disciplines and institutions in order to reflect all the different nuances, requirements and advice sounds like a combinatorial explosion (I may have misunderstood this). It is possible that the DMP Tool approach might reduce this combinatorial explosion, or at least parcel elements of it out to the institutions, making it more manageable.

The other key thing about these tools is that they need better support. This means more resources for development and maintenance. That might mean more money, or it might mean building a better Open Source partnership arrangement. DMPonline does get some codebase contributions already, but the impression is that the DMP Tool partnership model has greater potential to be sustainable in the absence of external funding, which must eventually be the situation for these tools.

It is worth emphasising that this is nevertheless a pretty powerful set of tools, and potentially very valuable to researchers planning their projects and institutions, departments etc trying to establish the necessary infrastructure.


Categories: Planet DigiPres

The 2013 International Data Rescue Award in the Geosciences – deadline Oct 10 2013

Alliance for Permanent Access News - 19 April 2013 - 2:14pm

Organised by IEDA and Elsevier Research Data Services, the International Data Rescue Award in the Geosciences is created to improve preservation and access of research data, particularly of dark data, and share the varied ways that these data are being processed, stored, and used. For more information see http://researchdata.elsevier.com/datachallenge
The organisers are interested in receiving submissions from groups who have developed and completed projects that have digitized previously unavailable content or that have facilitated and improved the ingestion of research data. The final submission deadline is October 10, 2013.

Categories: Planet DigiPres

The 2013 International Data Rescue Award in the Geosciences – deadline Oct 10 2013

Alliance for Permanent Access News - 19 April 2013 - 2:14pm

Organised by IEDA and Elsevier Research Data Services, the International Data Rescue Award in the Geosciences is created to improve preservation and access of research data, particularly of dark data, and share the varied ways that these data are being processed, stored, and used. For more information see http://researchdata.elsevier.com/datachallenge
The organisers are interested in receiving submissions from groups who have developed and completed projects that have digitized previously unavailable content or that have facilitated and improved the ingestion of research data. The final submission deadline is October 10, 2013.

Categories: Planet DigiPres

Reply to DSHR's Comment

Digital Continuity Blog - 8 April 2013 - 3:26pm

I ran over the comment limits on David Rosenthal’s blog when I tried to reply to his reply to my comment on his blog. I’ve included my reply below instead. 

Hi David,

The problem I see is that we fundamentally disagree on the framing of the digital preservation challenge. I meant to reply to your last "refutation" of Jeff Rothenberg’s presentation at Future Perfect 2012 but hadn’t gotten around to it yet. Perhaps now is a good time. I was the one that organised Jeff’s visit and presentation and I talked with him about his views both before and after so I have a pretty good idea of what he was trying to say.  I won’t try to put words into his mouth though and will instead give my (similar) views below.

The digital preservation challenge, as I see it, is to preserve digitally stored or accessed content over time. I think we can both agree that if we aren’t leaving something unchanged then we aren’t preserving anything. So, to me, the digital preservation challenge requires that we ensure that the content is unchanged over time

Now I’m not sure if you would agree that that is what we are trying to do. If you do, then it seems we disagree on what the content is that we are trying to preserve.  If you disagree that that is what we are trying to do then at least we might be able to make some progress on figuring out what the disagreement stems from.

So if you can at least understand my perspective I’d also like to address your comments about format obsolesce. I’m not a proponent of the idea of format obsolescence. The idea makes little sense to me. However I am a proponent of a weak form of the idea of software obsolescence and, more importantly, the associated idea of content loss due to software obsolescence.

The weaker form of the idea of software obsolescence that I’m a proponent of is that because of hardware changes, software loss and loss of understanding about how to use software, software becomes unusable using current technology without active intervention.

The associated idea of content loss that I am a proponent of is the idea that to successfully preserve many types of content you need to preserve software that that content relies upon in order to be presented to users and interacted with. A stronger way of putting that is to say that in many cases, the thing to be preserved is so inextricably connected to the software that the software is part of that thing.

If you take that leap to accepting (whether fully or in order to simplify the explanation) that the software is part of the thing to be preserved, then it becomes obvious that practitioners who are  only doing migration are in many cases not doing real preservation as they are not preserving the entirety of the objects.  Hence Jeff’s presentation in which he reprimanded the community for not really making progress since the early 2000s.  Almost nobody is preserving the software functionality.

As it is relevant to your post and comments, I’ll use a web page as an example to illustrate what I mean. The content presented to users for interaction with by a traditional web page, is presented using a number of digital files including the server hosted files, e.g. the web server & applications, the html/XHTML pages, scripts, images, audio, and the locally hosted files such as the browser, fonts, browser skins, extensions etc. The combination of these files mediated by usually at least two computers (the server and the client) together present content to the user that the user can interact with it. Changing any one of the files involved in this process may change the content presented to the user.  To preserve such a page it is my view that we need to start by deciding what content makes up the page so that we can both begin to preserve it and so that we can also confirm that that content has been preserved and is still there in an unchanged form at a point in the future. In most cases it’s likely that all that needs to be preserved is the basic text and images in the page and their general layout. If this is all then migration techniques may well be appropriate if the browser ever becomes unable to render the text and images (though I agree with you that that doesn’t seem necessary yet or likely to be necessary in a hurry). However there are two difficulties with this scenario:

  1. There will be many cases where the content includes interactive components and/or things that include software dependencies.
  2.  When you don’t know, or can’t affordably identify the content to be preserved, preserving as much as possible, cheaply, is your best option. 

(A)  means  that you will require some solution that involved preserving the software’s functionality, and I believe that (B) means you should use an emulation based technique to preserve the content.

Emulation based techniques are highly scalable (across many pieces of digital content) and so benefit from economies of scale. Emulation strategies and tools, once fully realised, I believe will provide a cheaper option when you factor in the cost of confirming the preservation of the content.

It’s a bit like the global warming problem. Most products and services do not include the carbon cost in them. If they did they would likely be much more expensive. Well I believe digital preservation solutions are similar: if you factor in the costs of confirming/verifying the preservation of the content you are trying to preserve, then many solutions are likely to be prohibitively expensive as they will require manual intervention at the individual object level.  Emulation solutions, on the other hand, can be verified at the environment level and applied across many objects, greatly reducing costs.

So as I see it, it is not about format obsolescence, it is about (a weak form of) software obsolescence and preservation of content that can’t be separated from software.

In your post you seemed to be suggesting something similar, that content needed to be preserved that was heavily reliant upon browsers and server based applications. You also discussed a number of approaches including some that involved creating and maintain virtual machines, and followed that with the statement that: “the most important thing going forward will be to deploy a variety of approaches”. I took that to mean you had softened a little in your attitude towards using emulation to preserve content over time<a« «.

Sorry, I seem to have misunderstood.

Categories: Planet DigiPres

Is the PDF format appropriate for preserving documents with long perspective?

Unsustainable Ideas - 19 March 2013 - 11:21am

Paul Wheatley drew attention to this question on Stack Exchange yesterday:

“PDF is almost a de facto standard when it comes to exchanging documents. One of the best things is that always, on each machine, the page numbers stay the same, so it can be easily cited in academic publications etc.

But de facto standard is also opening PDFs with Acrobat Reader. So the single company is making it all functioning fluently.

However, thinking in longer perspective, say 50 years, is it a good idea to store documents as PDFs? Is the PDF format documented good enough to ensure that after 50 years it will be relatively easy to write software that will read such documents, taking into account that PDF may be then completely deprecated and no longer supported?”

I tried to respond, but fell foul of Stack Exchanges login/password rules, which mean I’ve created a password I can’t remember. And I was grumpy because our boiler isn’t working AFTER it’s just been serviced (yesterday, too), so I was (and am) cold. Anyway, I’ve tried answering on SE before and had trouble, and I thought I needed a bit more space to respond. My short answer was going to be:

“There are many many PDF readers available implemented independently of Adobe. There are so many documents around in PDF, accessed so frequently, that the software is under constant development, and there is NO realistic probability that PDF will be unreadable in 50 years, unless there is a complete catastrophe (in which case, PDF is the least of your worries). This is not to say that all PDF documents will render exactly as now.”

Let’s backtrack. Conscious preservation of artefacts of any kind is about managing risk. So to answer the question about whether a particular preservation tactic (in this case using PDF as an encoding format for information) is appropriate for a 50-year preservation timescale, you MUST think about risks.

Frankly, most of the risks for any arbitrary document (a container for an intellectual creation) have little to do with the format. Risks independent of format include:

  • whether the intellectual creation is captured at all in document form,
  • whether the document itself survives long enough and is regarded as valuable enough to enter any system that intends to preserve it,
  • whether such a system itself can be sustained over 50 years (the economic risks here being high),
  • not to mention whether in 50 years we will still have anything like current computer and internet systems, or electricity, or even any kind of civilisation!

So, if we are thinking about the risks to a document based on its format, we are only thinking about a small part of the total risk picture. What might format-based risks be?

  • whether the format is closed and proprietary
  • whether the format is “standardised”
  • whether the format is agressively protected by IP laws, eg copyright, trademark, patents etc
  • whether the format requires, or allows DRM
  • whether the format requires (or allows) inclusion of other formats
  • the complexity of the format
  • whether the development of the format generally allows backwards compatibility
  • whether the format is widely used
  • whether tools to access the format are closed and licensed
  • whether tools to access the format are linked to particular computer systems environments
  • whether various independent tools exist
  • how good independent tools are at creating, processing or rendering the format

and no doubt others. By the way the impact of these risks all differ. You have to think about them for each case.

So let’s see how PDF does… no, hang on. There are several families within PDF. There’s the “bog-standard” PDF. There’s PDF/A up to v2. There’s PDF/A v3. There are a couple of other variants including one for technical engineering documents. Let’s just think about “bog-standard” PDF: Adobe PDF 1.7, technically equivalent to ISO standard ISO 32000-1:2008:

  • The format was proprietary but open; it is now open
  • it is the subject of an ISO standard, out of the control of Adobe (this might have its own risks, including the lack of openness of ISO standards, and the future development of the standard)
  • it allows, but does not require DRM
  • it allows, but does not require the inclusion of other formats
  • PDF is very complex and allows the creation of documents in many different ways, not all of which are useful for all future purposes (for example, the characters in a text can be in completely arbitrary order, placed by location on the page rather than textual sequence)
  • PDF has generally had pretty good backwards compatibility
  • the format is extremely widely used, with many billions of documents worldwide, and no sign of usage dropping (so there will be continuing operational pressure for PDF to continue accessible)
  • many PDF creating and reading tools are available from multiple independent tool creators; some tools are open source (so you are not likely to have to write such tools)
  • PDF tools exist on almost all computer systems in wide use today
  • some independent PDF tools have problems with some aspects of PDF documents, so rendering may not be completely accurate (it’s also possible that some Adobe tools will have problems with PDFs created by independent tools). Your mileage may vary.

So, the net effect of all of that, it seems to me is that provided you steer clear of a few of the obvious hurdles (particularly DRM), it is reasonable to assume that PDF is perfectly fine for preserving most documents for 50 years or so.

What do you think?


Categories: Planet DigiPres

Open postcode? That’ll be a “no” then!

Unsustainable Ideas - 14 March 2013 - 11:12am

A month or so ago I got an email from the OpenRightsGroup, asking me to write to a minister supporting the idea of retaining the Postcode database as Royal Mail is privatised, and making it Open. The suggested text was as follows:

“Dear [Minister of State for Business and Enterprise]
“We live in an age where location services underpin a great chunk of the economy, public service delivery and reach intimate aspects of our lives through the rise of smartphones and in-car GPS. Every trip from A to B starts and ends in a postcode.
“In this context, a national database of addresses is both a critical national asset and a natural monopoly, which should not be commercially exploited by a single entity. Instead, the Postcode Address File should be made available for free reuse as part of our national public infrastructure.The postcode is now an essential part of daily life for many purposes. Open availbaility would create re-use and mashup opportunities with an economic value far in excess of what can be realised from a restrictive licence.
“I am writing to you as the minister responsible to ask for a public commitment to:
“1) Keep the Postcode Address File (PAF) under public ownership in the event of the Royal Mail being privatised.
“2) Release the PAF as part of a free and open National Address Dataset.”

A few days ago I got a response. I think it must be from a person, as the writer managed to mis-spell my name (not likely to endear him (or her) to me!)

“Dear Mr Rushbridge,

“Thank you for your email of 6 February to the Minister for Business and Enterprise, Michael Fallon MP, regarding the Postcode Address File (PAF).

“I trust you will understand that the Minister receives large amounts of correspondence every day and regretfully is unable to reply to each one personally.  I have been asked to reply.

“The Government’s primary objective in relation to Royal Mail is to secure a sustainable universal postal service.  The postcode was developed by Royal Mail in order to aid delivery of the post and is integral to Royal Mail’s nationwide operations.  However, we recognise that postcode data has now become an important component of many other applications, for example sat-navs.

“In light of PAF’s importance to other users, there is legislation in place to ensure that PAF must be made available to anyone who wishes to use it on terms that are reasonable.  This allows Royal Mail to charge an appropriate fee whilst also ensuring that other users have access to the data.  The requirement is set out in the Postal Services Act 2000 (as amended by the Postal Services Act 2011) and will apply regardless of who owns Royal Mail.  It is this regulatory regime, and not ownership of Royal Mail, that will ensure that PAF continues to be made available on reasonable terms.  Furthermore, Ofcom, the independent Regulator, has the power to direct Royal Mail as to what ‘reasonable’ terms are.  Ofcom are currently consulting on the issue of PAF regulation and more information can be found on their website at: http://www.ofcom.org.uk.

“On the question of a National Address Register, the UK already has one of the most comprehensive addressing data-sets in the world in the form of the National Address Gazetteer (NAG).  The NAG brings together addressing and location data from Ordnance Survey, Local Authorities and Royal Mail; the Government is committed to its continuation as the UK’s definitive addressing register.

“The Government is similarly committed to ensuring that the NAG is used to its full benefit by both public and private sector users, and keeps pricing and licensing arrangements under review with the data owners.  Alongside our commitment to the NAG, the Government is continuing to consider the feasibility of a national address register.

“I trust you will find this information helpful in explaining the position on this subject.

“Yours sincerely,

“BIS MINISTERIAL CORRESPONDENCE UNIT”

So, that’ll be a “No” then. But wait! Maybe there’s a free/open option? No such luck! From Royal Mail’s website, it looks like £4,000 for unlimited use of the entire PAF (for a year?), or £1 per 100 clicks. You can’t build an open mashup on that basis. Plus there’s a bunch of licences to work out and sign.

What about the wonderful National Address Gazeteer? It’s a bit hard to find out, as there seem to be mutiple suppliers, mainly private sector. Ordnance Survey offers AddressBase via their GeoPlace partnership, which appears [pdf] to cost £129,950 per year plus £0.008 per address for the first 5 million addresses! So that’s not exactly an Open alternative, either!

Now I’m all for Royal Mail being sustainable. But overall, I wonder how much better off the whole economy would be with a Open PAF than with a closed PAF?


Categories: Planet DigiPres