The browser-shots tool is developed by Internet Memory in the context of SCAPE project, as part of the preservation and watch (PW) sub-project. The goal of this tool is to perform automatic visual comparisons, in order to detect rendering issues in the archived Web pages.
This two-day event will guide delegates through the steps of the SCAPE Preservation Planning and Watch lifecycle. It will begin with an introduction to the SCAPE Planning and Watch ecosystem, its components and how they support the full lifecycle of digital objects.
Delegates will learn about both human-readable and machine-readable policy drivers and will create policy statements.
Following this, delegates will be introduced to a number of tools that have been developed or extended as part of this SCAPE project. These include the content-profiling tool, C3PO, which will enable you to perform deep analysis of your digital content; Scout, a preservation watch system which can detect preservation risks and opportunities through its knowledge base and automatically notify users of new risks and opportunities; and Plato, a decision-making tool that implements a solid preservation planning process and integrates services for content characterisation, preservation action and automatic object comparison.
Learning Outcomes (by the end of the training event the attendees will be able to):
- Understand the SCAPE Planning and Watch lifecycle, and the benefits and opportunities.
- Explain why a policy framework is important in successful preservation planning.
- Evaluate when and how to execute preservation planning.
- Apply preservation policy to a planning scenario.
- Carry out content profiling experiments.
- Analyse content profile results.
- Understand how to monitor your repository and its environment
- Create a preservation plan.
- Envisage how the SCAPE components can be applied to your own collection
Delegates will receive a certificate of attendance for the training course.
The draft agenda is available here: http://wiki.opf-labs.org/display/SP/Agenda+-+Effective%2C+Evidence-Based+Preservation+Planning
The event will be conducted in English.
Who should attend?
Practitioners (digital librarians and archivists, digital curators, repository managers, or anyone responsible for managing digital collections) To get the most out of this training course you will ideally have some knowledge or experience of digital preservation.
Developers who are interested in learning about automated preservation planning and policies.
Please register at: https://scape-preservation-planning.eventbrite.co.uk . The cost for the two days is 800 DKK (approximately €107 / £92).
Morning and afternoon coffee breaks and lunch will be provided and are included in the registration fee.
*Please ensure you bring your laptop with you so you can participate in the practical exercises.*
Registration will close on Wednesday 6 November.
Please visit the event wiki for travel and accommodation information.
For more information about the SCAPE project visit: http://www.scape-project.eu/
The SCAPE Project and OPF are running a hackathon for developers and practitioners, focussing on Hadoop, an open source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Hadoop is designed to scale out from single servers to thousands of machines.
Jimmy Lin from the University of Maryland will our guest speaker at the event. Jimmy has been working with Big Data and Hadoop for many years, with a focus on natural language processing and information retrieval. He spent an extended sabbatical at Twitter from 2010-2012 to work on large-scale analytics, on which he provides valuable insights in his 2013 HadoopsummitEU talk ‘Big Data Mining Infrastructure: The Twitter Experience” (http://www.youtube.com/watch?v=T5ZjSFnOxys). He has a book out on MapReduce (http://lintool.github.io/MapReduceAlgorithms/) and is currently working on a scalable rendering engine for web archives based on HBase.
We will be working with two digital preservation scenarios:
- Web Archiving: File Format Identification/Characterisation
- Digital Books: Quality Assurance, text mining (OCR Quality)
Alternatively, if you have something else you would like to work on using Hadoop, just let us know, we are keen to hear your ideas.
Practitioners and developers will work together in groups to address digital preservation challenges using Hadoop. Practitioners will take the role of issue champion, and will articulate their requirements to the developers and document them on the wiki. Developers will brainstorm ideas, and work on solutions to the issues. There will be regular check in points to get feedback and refine requirements. There will be prize for the best issue champion and development solution.
All participants will gain practical experience of using digital preservation tools in characterisation and quality assurance processes. We will provide step-by-step worksheets for those who are less familiar with using the command line, and our experts will be on hand to help you through them.
There will be plenty of opportunities for discussion. We have a session for sharing experiences implementing Hadoop at your organisation, research project reports and a break out space for lightening talks. We welcome suggestions for talks or discussions you would like to hear about.
The draft agenda can be seen at: http://wiki.opf-labs.org/display/SP/Agenda+-+Hadoop+Driven+Digital+Preservation
Who should attend?
Practitioners (digital librarians and archivists, digital curators, repository managers, or anyone responsible for managing digital collections) You will learn how Hadoop might fit your organisation, how to write requirements to guide development and gain some hands on experience using tools yourself and finding out how they work. To get the most out of this training course you will ideally have some knowledge or experience of digital preservation.
Developers of all experience can participate, from writing your first Hadoop jobs, to working on scalable solutions for issues identified in the scenarios.
Please register here: https://hadoop-driven-digital-preservation.eventbrite.co.uk.
OPF members are invited to attend free of charge. Please use the code issued by email to waive the fee.
Non-members are welcome to attend at a cost of €200. Morning and afternoon coffee breaks and lunch will be provided and are included in the registration fee.
*Early bird rate* register before 25 October to get 10% off.
Registration will close on Monday 25 November.
For information about travel and accommodation please visit the event wiki page: http://wiki.opf-labs.org/pages/viewpage.action?pageId=32604217.
It's been more than two years now since I wrote my D-Lib paper JPEG 2000 for Long-term Preservation: JP2 as a Preservation Format. From time to time people ask me about the status of the issues that are mentioned in that paper, so here's a long overdue update.
Preserving Cultural Heritage