The Signal: Digital Preservation
In this interview, FADGI talks with Hannah Frost, Digital Library Services Manager at Stanford Libraries and Manager, Stanford Media Preservation Lab and Jenny Brice, Preservation Coordinator at Bay Area Video Coalition about the AV Artifact Atlas.
One of my favorite aspects of the Federal Agencies Digitization Guidelines Initiative is its community-based ethos. We work collaboratively across federal agencies on shared problems and strive to share our results so that everyone can benefit. We’ve had a number of strong successes including the BWF MetaEdit tool, which has been downloaded from SourceForge over 10,000 times. In FADGI, we’re committed to making our products and processes as open as possible and we’re always pleased to talk with other like-minded folks such as Hannah Frost and Jenny Brice from the AV Artifact Atlas project.
The AV Artifact Atlas is another community-based project that grew out of a shared desire to identify and document the technical issues and anomalies that can afflict audio and video signals. What started out as a casual discussion about quality control over vegetarian po’boy sandwiches at the 2010 Association of Moving Image Archivists annual meeting, the AV Artifact Atlas has evolved into an online knowledge repository of audiovisual artifacts for in-house digitization labs and commercial vendors. It’s helping to define a shared vocabulary and will have a significant impact on codifying quality control efforts.
For an overview of AVAA, check out The AV Artifact Atlas: Two Years In on the Media Preservation blog from the Media Preservation Initiative at Indiana University Bloomington.
Kate: Tell me how the AV Artifact Atlas came about.
Hannah: When we get together, media preservation folks talk about the challenges we face in our work. One of the topics that seems to come up over and over again is quality and the need for better tools and more information to support our efforts to capture and maintain high quality copies of original content as it is migrated forward into new formats.
When creating, copying, or playing back a recording, there are so many chances for error, for things to go sideways, lowering the quality or introducing some imperfection to the signal. These imperfections leave behind audible or visible artifacts (though some are more perceptible than others). If we inspect and pay close attention, it is possible discover the artifacts and consider what action, if anything, can be taken to prevent or correct them.
The problem is most archivists, curators and conservators involved in media reformatting are ill-equipped to detect artifacts, or further still to understand their cause and ensure a high quality job. They typically don’t have deep training or practical experience working with legacy media. After all, why should we? This knowledge is by and large the expertise of video and audio engineers and is increasingly rare as the analog generation ages, retires and passes on. Over the years, engineers sometimes have used different words or imprecise language to describe the same thing, making the technical terminology even more intimidating or inaccessible to the uninitiated. We need a way capture and codify this information into something broadly useful. Preserving archival audiovisual media is a major challenge facing libraries, archives and museums today and it will challenge us for some time. We need all the legs up we can get.
AV Artifact Atlas is a leg up. We realized that we would benefit from a common place for accumulating and sharing our knowledge and questions about the kinds of issues revealed or introduced in media digitization, technical issues that invariably relate to the quality of the file produced in the workflow. A wiki seemed like a natural fit given the community orientation of the project. I got the term “artifact atlas” imaging guru Don Williams, an expert adviser for the FADGI Still Image Working Group.
Initially we saw the AV Artifact Atlas as a resource to augment quality control processes and as a way to structure a common vocabulary for technical terms in order to help archivists, vendors and content users to communicate, to discuss, to demystify and to disambiguate. And people are using it this way: I’ve seen it on listservs.
But we have also observed that the Atlas is a useful resource for on-the-job training and archival and conservation education. It’s extremely popular with people new to the field who want to learn more and strengthen their technical knowledge.
Kate: How is the AVAA governed? What’s Stanford Media Preservation Lab’s role and what’s Bay Area Video Coalition’s role?
Hannah: The Stanford Media Preservation Lab team led the initial development of the site, which started in 2012 and we’ve been steadily adding content ever since. We approached BAVC as an able partner because BAVC demonstrates an ongoing commitment to the media community and a genuine interest in furthering progress in the media archiving field.
Jenny: Up until this past year, BAVC’s role has primarily been to host the AVAA. We’ve always wanted to get more involved in adding content, but haven’t had the resources. When we started planning for the QC Tools project, we saw the AVAA as a great platform and dissemination point for the software we were developing. Through funding from the National Endowment for the Humanities, we now have the opportunity to focus on actively developing the analog video content in the AVAA. The team at SMPL have been a huge part of the planning process for this stage of the project, offering invaluable advice, ideas and feedback.
Over the next year, BAVC will be leading a project to solicit knowledge, expertise and examples of artifacts found in digitized analog video from the wider AV preservation community to incorporate into the AVAA. Although BAVC is leading this leg of the project, SMPL will be involved every step of the way.
Kate: You mentioned the Quality Control Tools for Video Preservation or QC Tools project. How does the AVAA fit into that?
Jenny: In 2013, BAVC received funding from the NEH to develop a software tool that analyzes video files to identify and graph errors and artifacts. You can drop a digital video file into the software program and it will produce a set of graphs from which various errors and artifacts can be pinpointed. QC Tools will show where a headclog happens and then connect the user to the AVAA to understand what a headclog is and if it can be fixed. QC Tools will make it easier for technicians digitizing analog video to do quality control of their work. It will also make it easier for archivists and other people responsible for analog video collections to quality check video files they receive from vendors, as well as accurately document video files for preservation. The AVAA, by providing a common language for artifacts as well as detailed descriptions of their origin and resolution (if any), helps serve these same purposes.
Kate: My favorite AVAA entry is probably the one for Interstitial Errors because it’s an issue that FADGI is actively working on. (In fact, when I mentioned this project in a previous blog post, you’ll notice a link to the AVCC in the Interstitial Error caption!) What topics stand out for you and why?
Jenny: When I first started interning at BAVC, I was totally new to video digitization. I relied heavily on the AVAA to help me understand what I was seeing on screen, why it was happening and what (if anything) could be done. The entries for Video Head Clog, Tracking Error and Tape Crease hold a special place in my heart because I saw them often when digitizing, and it took many, many repeat views of the examples in the AVAA before I could reliably tell them apart.
Hannah: There are so many to choose from! One highlight is SDI Spike, because it is a great example of a digitization error – and pretty egregious one at that – and thus demonstrates exactly why careful quality control is critical in preservation workflows. The DV Head Clog entry is noteworthy, as the clip shows how dramatic digital media errors can be, especially when compared to analog ones. Other favorite entries include those that give the reader lots of helpful, practical information about resolving the problem, as seen in Crushed Setup and Head Switching Noise.
Kate: Where do you get your visual examples and data for the Atlas? Are there gaps you’re looking to fill?
Hannah: Many of the entries were created by SMPL staff, drawing on research we’ve done and our on-the-job experience, and most of the media clips and still images derive from issues we encountered in our reformatting projects. A few other generous folks have contributed samples and content, too. We are currently in the process of incorporating content from the Compendium of Image Errors in Analogue Video, a superb book published in 2012 that was motivated by the same need for information to support media art conservation. We are deeply grateful to authors Joanna Phillips and Agathe Jarczyk for working with us on that.
Our biggest content gaps are in the area of audio: we are very eager for more archivists, conservators, engineers and vendors to contribute entries with examples! Also the digital video area needs more fleshing out. The analog video section is pretty well developed at this point, but we still need frames or clips demonstrating errors like Loss of Color Lock and Low RF. We keep a running list of existing entries that are lacking real-life examples on the Contributor’s Guide page.
Kate: I love the recently added audio examples to augment the visual examples. It’s great to not only see the error but also to hear it. How did this come about and what other improvements/next steps are in the works?
Hannah: Emily Perkins, a student of the University of Texas School of Information, approached us about adding the Sound Gallery as part of her final capstone project. Student involvement in the Atlas development is clearly a win-win situation, so we encourage more of that! We are also currently planning to implement a new way to navigate the content in terms of error origin. The new categories – operator error, device error, carrier error, production error – will help those Atlas users who want to better understand the nature of these errors and how they come about.
Jenny: As part of the NEH project, we want to look closely at the terms and definitions and correlate them with other resources, such as the Compendium of Image Errors in Analogue Video that Hannah mentioned. We also want to include more examples – both still images and video clips – to help illustrate artifacts. As QC Tools becomes more developed, we want to include some of the graphs of common artifacts produced by the software. The hope is that users of the AVAA or of QC Tools will have more than one way to identify the artifacts they encounter.
Kate: It can be challenging to keep the content and enthusiasm going for community-based efforts. What have you learned since the project launched and how has it influenced your current approach?
Hannah: So true: keeping the momentum going is a real challenge. Most of the contributions made to date have been entirely voluntary, and while the NEH funding is a welcome and wonderful development – not to mention a vote of confidence that the Atlas is a valuable resource – we understand fully well that generous donations of time and knowledge on the part of novice and expert practitioners will always be fundamental to the continued growth and success of the Atlas.
It definitely takes a core group of committed people to keep the momentum going and you always need to beat the bush for contributions. In our day-to-day work at SMPL, it has come to the point where I routinely ask myself about a problem we encounter: “is this something we can add to the Atlas? Have we just learned something that we can share with others?” If more practitioners adopted this frame of mind, the wiki would certainly develop more rapidly! I also try to remind folks that you don’t have to be an expert engineer to contribute. Practical information from and for all levels of expertise is our primary goal.
Kate: Is there anything you’d else like to mention about AVAA?
Jenny: We’re hiring! Thanks to funding from the NEH, we are able to hire someone part-time to work exclusively on building out content and community for the AV Artifact Atlas. If you are passionate and knowledgeable about video preservation, consider applying. We’re really excited to hire a dedicated AVAA Coordinator and to see how this position will help the Atlas grow!
The following is a guest post by Heidi Dowding, Resident at the Dumbarton Oaks Research Library in Washington, DC
As part of the National Digital Stewardship Residency program’s biweekly takeover of The Signal, I’m here to talk about my project at Dumbarton Oaks Research Library and Collection. And by the way, if you haven’t already checked out Emily Reynolds’ post on the residency four months in as a primer, go back and read that first. I’ll wait.
OK then, on we go.
My brief history in residence at this unique institution technically started in September, but really the project dates back a little over a year to a digital asset management information gathering survey that was undertaken by staff at Dumbarton Oaks. Concerned with DO’s shrinking digital storage capacity, they were hoping to find out how various departments were handling their digital assets. What they discovered was that, with no central policy guiding digital asset management within the institution, ad hoc practices were overlapping and causing manifold problems.
This is about where my project entered the scene. As part of the first cohort of NDSR residents, I’ve been tasked with identifying an institution-wide solution to digital asset management. This has first involved developing a deep (at times, file-level) understanding of Dumbarton Oaks’ digital holdings. These include the standard fare – image collections, digital books, etc. – but also more specialized content like the multimedia Oral History Project and the GIS Tree Care Inventory. I started my research with an initial survey sent to everyone around the institution, and then undertook interviews and focus groups with key staff in every department.
While I uncovered a lot of nuanced information about user behaviors, institutional needs, and the challenges we currently face, the top-level findings are threefold.
First, relationships within an institution make or break its digital asset management.
This is largely because each department has a different workflow for managing assets, but no department is an island. In interdepartmental collaborations, digital assets are being duplicated and inconsistently named. This is especially apparent in the editorial process at DO, wherein an Area of Study department acts as intermediary between the Publications department and various original authors. Duplicative copies are being saved in various drives around the institution, with very little incentive to clean and organize files once the project has been completed.
In this case, defined policies will aid in the development of interdepartmental collaborations in digital projects. My recommendation of a Digital Asset Management System (DAMS) will also hopefully aid in the deduplication of DO’s digital holdings.
Second, file formats are causing big challenges. Sometimes I even ran into them with my own research.
Other times, these were more treacherous around the institution, being caused by a lack of timely software updates for some of our more specialized systems or by a general proliferation of file formats. A lot of these issues could be addressed by central policy based on the file format action plans discussed by NDSR resident Lee Nilsson. Effective plans should address migration schedules and file format best practices.
Finally, staff need to be more proactive in differentiating between archival digital assets and everyday files.
By archival digital assets, I mean images from the ICFA or photographs of the gardens or word processing documents. This behavior becomes particularly problematic depending on where items are saved: many of the departmental drives are only backed up monthly, while a bigger institutional drive collectively referred to as ‘the Shared Drive’ is backed up daily. So if everyday items are being stored on a departmental drive, there is a much higher likelihood of data loss as there is no backup copy. Likewise, if archival assets are being put here with no local iteration being stored until the scheduled backup, really important digital assets could be lost. Finally, this also becomes problematic when digital assets are being stored long-term on the Shared Drive – they take up precious space and are not being properly organized and cared for.
My job over the next few months will be to look at potential Digital Asset Management Systems to determine whether a specific tool would assist Dumbarton Oaks’ staff in better managing digital files. I will also be convening a Digital Preservation Working Group to carry on my work after my residency ends in May.
Please check out NDSR at the upcoming ALA Midwinter Digital Preservation Interest Group meeting at 8:30am on Sunday, January 24 in the Pennsylvania Room.
In my work at the Library, one of my larger projects has to do with the acquisition and preservation of eserials. But this I don’t mean access to licensed and hosted eserials, but the acquisition and preservation of eserial article files that come to the Library.
In many ways, this is just like other acquisition streams and workflows: some specifications for the content are identified; electronic transfer mechanisms are put in place; processing includes automated and human actions including inspection, metadata extraction and enrichment, and organization; and files are moved to the appropriate storage locations.
They are serials and have a complex organization of files/articles/issues/volumes/titles. There are multiple formats, content, and metadata standards in play. Publisher often now have a very frequent article-based publishing model that includes versions and updates. And the packages of files to be transferred between and within organizations can have many formats.
My Library of Congress colleague Erik Delfino reached out to our colleagues at the National Institutes of Health/National Library of Medicine who operate PubMed Central, who deal with similar issues. Out of our shared interest has come a NISO working group to develop a protocol for the transfer and exchange of files called PESC – Protocol for Exchanging Serial Content. This group is co-chaired by the Library of Congress and NIH, and has representatives from publishers small and large, data normalizers and aggregators, preservation organizations, and organizations with an interest in copyright issues.
This group is making great progress identifying the scope of the problem, looking at how a variety of organizations solve the problem for their own operations, and drafting its ideas for solutions for exchange that support the effective management and preservation of serials.
If you are interested in the work, please read the Work Item description at the PESC web site, and check out who’s involved. There will also be a brief update presented as part of the NISO standards session at ALA Midwinter on Sunday, January 26 from 1-2:30 PM in Pennsylvania Convention Center room 118 C.
We hear a constant stream of news about how crunching massive data collections will change everything from soup to nuts. Here on The Signal, it’s fair to say that scientific research data is close to the heart of our hopes, dreams and fears when it comes to big data: we’ve written over two-dozen posts touching on the subject.
In the context of all this, it’s exciting to see some major projects getting underway that have big data stewardship closely entwined with their efforts. Let me provide two examples.
The Registry of Data Repositories seeks to become a global registry of “repositories for the permanent storage and access of data sets” for use by “researchers, funding bodies, publishers and scholarly institutions.” The activity is funded by the German Research Foundation through 2014 and currently has 400 repositories listed. With the express goal to cover the complete data repository landscape, re3data.org has developed a typology of repositories that compliments existing information offered by individual instutions. The aim is to offer a “systematic and easy to use” service that will strongly enhance data sharing. Key to this intent is a controlled vocabulary that describes repository characteristics, including policies, legal aspects and technical standards.
In a bow to the current trend for visual informatics, the site also offers a set of icons with variable values that represent repository characteristics. The project sees the icons as helpful to users as well as to assist repositories “identify strengths and weaknesses of their own infrastructures” and keep the information up to date.
I really like this model. It hits the trifecta in appealing to creators who seek to deposit data, to users who seek to find data and to individual repositories who seek to evaluate their characteristics against their peers. It remains to be seen if it will scale and if it can attract ongoing funding, but the approach is elegant and attractive.
The second example is ELIXIR, an initiative of the EMBL European Bioinformatics Institute. ELIXIR aims to “orchestrate the collection, quality control and archiving of large amounts of biological data produced by life science experiments,” and “is creating an infrastructure – a kind of highway system – that integrates research data from all corners of Europe and ensures a seamless service provision that it is easily accessible to all.”
This is huge undertaking and has the support of many nations who are contributing millions of dollars to build a “hub and nodes” network. It will connect public and private bioscience facilities throughout Europe and promote shared responsibility for biological data delivery and management. The intention is to provide a single interface to hundreds of distributed databases and a rich array of bioinformatics analysis tools.
ELIXIR is a clear demonstration of how a well-articulated need can drive massive investment in data management. The project has a well-honed business case that presents an irresistible message. ”Biological information is of vital significance to life sciences and biomedical research, which in turn are critical for tackling the Grand Challenges of healthcare for an ageing population, food security, energy diversification and environmental protection,” reads the executive summary. “The collection, curation, storage, archiving, integration and deployment of biomolecular data is an immense challenge that cannot be handled by a single organisation.” This is what the Blue Ribbon Task Force on Sustainable Digital Preservation and Access termed “the compelling value proposition” needed to drive the enduring availability of digital information.
As a curious aside, it’s worth nothing that projects such as ELIXIR may have an unexpected collateral impact on data preservation. Ewan Birney, a scientist and administrator working on ELIXIR, was so taken with the challenge of what he termed “a 10,000 year archive” holding a massive data store that he and some colleagues (over a couple of beers, no less) came up with a conjecture for how to store digital data using DNA. The idea was sound enough to merit a letter in Nature, Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. So, drawing the attention of bioinformaticians and other scientists to the digital preservation challenge may well lead to stunning leaps in practices and methods.
Perhaps one day the biggest of big data can even be reduced to the size of a bowl of alphabet soup or a bowl of mixed nuts!
The 2014 National Digital Stewardship Agenda, released in July 2013, is still a must-read (have you read it yet?). It integrates the perspective of dozens of experts to provide funders and decision-makers with insight into emerging technological trends, gaps in digital stewardship capacity and key areas for development.
The Agenda suggests a number of important research areas for the digital stewardship community to consider, but the need for more coordinated applied research in cost modeling and sustainability is high on the list of areas prime for research and scholarship.
The section in the Agenda on “Applied Research for Cost Modeling and Audit Modeling” suggests some areas for exploration:
“Currently there are limited models for cost estimation for ongoing storage of digital content; cost estimation models need to be robust and flexible. Furthermore, as discussed below…there are virtually no models available to systematically and reliably predict the future value of preserved content. Different approaches to cost estimation should be explored and compared to existing models with emphasis on reproducibility of results. The development of a cost calculator would benefit organizations in making estimates of the long‐term storage costs for their digital content.”
In June of 2012 I put together a bibliography of resources touching on the economic sustainability of digital resources. I’m pleasantly surprised as all the new work that’s been done in the meantime, but as the Agenda suggests, there’s more room for directed research in this area. Or perhaps, as Paul Wheatley suggests in this blog post, what’s really needed are coordinated responses to sustainability challenges that build directly on this rich body of work, and that effectively communicate the results out to a wide audience.
I’ve updated the bibliography, hoping that researchers and funders will explore the existing body of projects, approaches and research, note the gaps in coverage suggested by the Agenda and make efforts to address the gaps in the near future through new research or funding.
As always, we welcome any additions you might have to this list. Feel free to leave suggestions in the comments.
The Web site addresses listed here were all valid as of as January 14, 2014.
Allen, Alexandra. “General Study 16 – Cost Benefit Models: Final Report.” InterPARES3 Project; 2013. Available at http://www.interpares.org/ip3/display_file.cfm?doc=ip3_canada_gs16_final_report.pdf
Arrow, Kenneth, Robert Solow, Paul R. Portney, Edward E. Leamer, Roy Radner, and Howard Schuman. “Report of the NOAA Panel on Contingent Valuation.” National Oceanic and Atmospheric Administration. 1993. Available at http://www.darrp.noaa.gov/library/pdf/cvblue.pdf
Ayris, P.; Davies, R.; McLeod, R.; Miao, R.; Shenton, H.; Wheatley, P. The LIFE2 final project report. LIFE Project: London, UK. 2008. Available at http://discovery.ucl.ac.uk/11758/
Barlow, John Perry. “The Economy of Ideas: Selling Wine Without Bottles on the Global Net.” See especially the section entitled Relationship and Its Tools. Available at http://homes.eff.org/~barlow/EconomyOfIdeas.html
Beagrie, N., Chruszcz, J., and Lavoie, B. Keeping Research Data Safe: A Cost Model and Guidance for UK Universities. Final Report. April 2008. Available at http://www.jisc.ac.uk/media/documents/publications/keepingresearchdatasafe0408.pdf
Beagrie, N., Lavoie, B., and Woollard, M. Keeping Research Data Safe 2. Final Report. April 2010. Available at http://www.jisc.ac.uk/media/documents/publications/reports/2010/keepingresearchdatasafe2.pdf
Blue Ribbon Task Force on Sustainable Digital Preservation and Access. Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information. February 2010. Available at http://brtf.sdsc.edu/biblio/BRTF_Final_Report.pdf
Blue Ribbon Task Force on Sustainable Digital Preservation and Access. Sustaining the Digital Investment: Issues and Challenges of Economically Sustainable Digital Preservation. December 2008. Available at http://brtf.sdsc.edu/biblio/BRTF_Interim_Report.pdf
Botea, Juanjo, Belen Fernandez-Feijoo and Silvia Ruiz. “The Cost of Digital Preservation: A Methodological Analysis.” Procedia Technology, Vol. 5; 2012. Available at http://www.sciencedirect.com/science/article/pii/S2212017312004434
Brown, Adrian. “Cost Modeling: The TNA Experience.” The National Archives (UK). Powerpoint slides presented at the DCC/DPC joint Workshop on Cost Models, held July 26, 2005. Available at http://www.dpconline.org/docs/events/050726brown.pdf
Buckland, Michael K. “Information as Thing.” Journal of the American Society for Information Science; Jun 1991; 42, 5; pg. 351-360. Available at http://people.ischool.berkeley.edu/~buckland/thing.html
Cantor, Nancy, and Paul N. Courant. “Scrounging for Resources: Reflections on the Whys and Wherefores of Higher Education Finance.” New Directions for Institutional Research, Volume 2003, Issue 119 , Pages 3 – 12. Also available as “Scrounge We Must–Reflections on the Whys and Wherefores of Higher Education Finance” at http://www.provost.umich.edu/speeches/higher_education_finance.html
Chambers, Catherine M., Paul E. Chambers and John C. Whitehead. “Contingent Valuation of Quasi-Public Goods: Validity, Reliability, and Application to Valuing a Historic Site.” Available at http://faculty.ucmo.edu/pchambers/adobe/historical.pdf
Chapman, Stephen. “Counting the Costs of Digital Preservation: Is Repository Storage Affordable?” Journal of Digital Information, Volume 4 Issue 2. 2003. Available at http://journals.tdl.org/jodi/article/view/100
Charles Beagrie Ltd. and JISC. Keeping Research Data Safe Factsheet. 2011. Available at http://beagrie.com/KRDS_Factsheet_0711.pdf
Charles Beagrie Ltd and the Centre for Strategic Economic Studies (CSES), University of Victoria. “Economic Impact Evaluation of the Economic and Social Data Service.” 2012. Available at http://www.esrc.ac.uk/_images/ESDS_Economic_Impact_Evaluation_tcm8-22229.pdf
Crespo, Arturo, Hector Garcia-Molina. “Cost-Driven Design for Archival Repositories.” Joint Conference on Digital Libraries 2001 (JCDL’01); June 24-28, 2001; Roanoke, Virginia, USA. Available at http://www-db.stanford.edu/~crespo/publications/cost.pdf
Currall, James, Claire Johnson, and Peter McKinney. “The Organ Grinder and the Monkey. Making a business case for sustainable digital preservation.” Presentation given at EU DLM Forum Conference 5-7 October 2005 Budapest, Hungary. Available at http://hdl.handle.net/1905/455
Currall, James, Claire Johnson, and Peter McKinney. “The world is all grown digital…. How shall a man persuade management what to do in such times?” 2nd International Digital Curation Conference, Digital Data Curation in Practice, 21-22 November 2006, Hilton Glasgow Hotel, Glasgow. Available at http://hdl.handle.net/1905/690
Currall, James, and Peter McKinney. “Investing in Value: A Perspective on Digital Preservation.” D-Lib Magazine, Volume 12, Number 4; April 2006. Available at http://www.dlib.org/dlib/april06/mckinney/04mckinney.html
Davies, Richard, Paul Ayris, Rory McLeod, Helen Shenton and Paul Wheatley.“How much does it cost? The LIFE Project ‐Costing Models for Digital Curation and Preservation.” LIBER Quarterly, Vol. 17, no. 3/4, 2007. Available at http://liber.library.uu.nl/index.php/lq/article/view/7895
Digital Preservation Coalition. “Report for the DCC/DPC Workshop on Cost Models for Preserving Digital Assets.” Available at http://www.dpconline.org/events/previous-events/137-cost-models. A series of powerpoint presentations from a day-long workshop held on July 26, 2005.
“Erpa Guidance: Cost Orientation Tool.” 2003. Available at http://www.erpanet.org/guidance/docs/ERPANETCostingTool.pdf
“espida Handbook: Expressing project costs and benefits in a systematic way for investment in information and IT.” University of Glasgow/JISC. 2007. Available at https://dspace.gla.ac.uk/bitstream/1905/691/1/espida_handbook_web.pdf
Fontaine, Kathy, Greg Hunolt, Arthur Booth and Mel Banks. “Observations on Cost Modeling and Performance Measurement of Long-Term Archives.” NASA Goddard Space Flight Center, Greenbelt, MD. 2007. Available at http://www.pv2007.dlr.de/Papers/Fontaine_CostModelObservations.pdf
Ghosh, Rishab Aiyer. “Cooking Pot Markets: an Economic Model for the Trade in Free Goods and Services on the Internet.” First Monday, Issue 3_3, 1998. Available at http://www.firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1516/1431
Granger, Stewart, Kelly Russell, and Ellis Weinberger: “Cost elements of Digital Preservation (version 4).” October 2000. Available at http://www.webarchive.org.uk/wayback/archive/20050409230000/http://www.leeds.ac.uk/cedars/colman/costElementsOfDP.doc
Griffin, Vanessa, Kathleen Fontaine, Gregory Hunolt, Arthur Booth, and David Torrealba. “Cost Estimation Tool Set for NASA’s Strategic Evolution of ESE Data Systems.” NASA. Unknown date. Available at http://vds.cnes.fr/manifestations/PV2002/DATA/5-8_griffin.pdf
Guthrie, Kevin, Rebecca J. Griffiths, Nancy L. Maron. Sustainability and Revenue Models for Online Academic Resources. Ithaka; 2008. Available at http://www.sr.ithaka.org/research-publications/sustainability-and-revenue-models-online-academic-resources
Hahn, Robert W. and Paul C. Tetlock. “Using Information Markets to Improve Public Decision Making.” AE_-Brookings Joint Center for Regulatory Studies; 2005. Available at http://www.law.harvard.edu/students/orgs/jlpp/Vol29_No1_Hahn_Tetlock.pdf
Hendley, Tony. “Comparison of Methods & Costs of Digital Preservation.” British Library Research and Innovation Report 106; 1998. Available at http://www.ukoln.ac.uk/services/elib/papers/tavistock/hendley/hendley.html
Hunter, Laurie, Elizabeth Webster and Anne Wyatt. “Measuring Intangible Capital: A Review of Current Practice.” Intellectual Property Research Institute of Australia Working Paper No. 16/04; 2005. Available at http://www.ipria.net/publications/wp/2004/IPRIAWP16.2004.pdf
Hunter, Laurie. “DCC Digital Curation Manual: Investment in an Intangible Asset.” University of Glasgow. 2006. Available at http://www.era.lib.ed.ac.uk/bitstream/1842/3340/1/Hunter%20intangible-asset.pdf
Iansiti, Marco, and Gregory L. Richards. “The Business of Free Software: Enterprise Incentives, Investment, and Motivation in the Open Source Community.” Harvard Business School. 2006. Preliminary draft available at http://www.hbs.edu/research/pdf/07-028.pdf
Kaufman, Peter B. “Assessing the Audiovisual Archive Market: Models and Approaches for Audiovisual Content Exploitation.” Presto Centre. 2013. Available at https://www.prestocentre.org/library/resources/assessing-audiovisual-archive-market
Kaur, Kirnn, Patricia Herterich, Suenje Dallmeier-Tiessen, Karlheinz Schmitt, Sabine Schrimpf, Heiko Tjalsma, Simon Lambert and Sharon McMeekin. D32.1 Report on Cost Parameters for Digital Repositories. Alliance for Permanent Access to the Records of Science Network. 2013. Available at http://www.alliancepermanentaccess.org/wp-content/uploads/downloads/2013/03/APARSEN-REP-D32_1-01-1_0.pdf
Kejser, Ulla Bøgvad, Anders Bo Nielsen and Alex Thirifays. Cost Model for Digital Preservation:Cost of Digital Migration. International Journal of Digital Curation, Issue 1, Vol. 6; 2011. Available at http://www.ijdc.net/index.php/ijdc/article/viewFile/177/246
King, Dennis M. King and Marisa Mazzotta. “Ecosystem Valuation.” Available at http://www.ecosystemvaluation.org/index.html. While this website pertains to considerations of natural environment valuation, its findings are applicable to the consideration of other intangible asset economies, such as the economic system surrounding digital preservation.
James, Hamish, Raivo Ruusalepp, Sheila Anderson, and Stephen Pinfield. “Feasibility and Requirements Study on Preservation of E-Prints.” JISC; 2003. Pg. 41-55. Available at http://www.sherpa.ac.uk/documents/feasibility_eprint_preservation.pdf
Lavoie, Brian. “Of Mice and Memory: Economically Sustainable Preservation for the Twenty-first Century.” Found in Access in the Future Tense. CLIR; 2004. Pg. 45-54. Available at http://www.clir.org/pubs/reports/pub126/pub126.pdf
Lavoie, Brian. “The Fifth Blackbird: Some Thoughts on Economically Sustainable Digital Preservation.” D‐Lib Magazine, Vol. 14, no. 3/4. March/April 2008. Available at http://www.dlib.org/dlib/march08/lavoie/03lavoie.html
Lavoie, Brian. “The Incentives to Preserve Digital Materials: Roles, Scenarios, and Economic Decision-Making.” OCLC Office of Research; 2003. Available at http://www.oclc.org/research/projects/digipres/incentives-dp.pdf
Lifecycle Information for E-literature: An Introduction to the third phase of the LIFE project. JISC/RIN. 2010. Available at http://www.life.ac.uk/3/docs/life3_report.pdf
Longhorn, Roger, and Michael Blakemore. “Re-visiting the Valuing and Pricing of Digital Geographic Information.” Journal of Digital Information 4, (2). 2003. Available at http://journals.tdl.org/jodi/article/viewFile/103/102
Machlup, Fritz. Knowledge: Its Creation, Distribution, and Economic Significance. Volume I: Knowledge and Knowledge Production. Princeton University Press; 1980.
Machlup, Fritz. Knowledge: Its Creation, Distribution, and Economic Significance. Volume III: The Economics of Information and Human Capital. Princeton University Press; 1984.
Maron, Nancy L., K. Kirby Smith, Matthew Loy. Sustaining Digital Resources: An On-the-Ground View of Projects Today. Ithaka; 2009. Available at http://www.sr.ithaka.org/research-publications/sustaining-digital-resources-ground-view-projects-today
Maron, Nancy L., Matthew Loy. Revenue, Recession, Reliance: Revisiting the SCA/Ithaka S+R Case Studies in Sustainability. Ithaka; 2011. Available at http://www.sr.ithaka.org/research-publications/revenue-recession-reliance-revisiting-scaithaka-sr-case-studies-sustainability
McLeod, Rory, Paul Wheatley, and Paul Ayris. “Lifecycle information for E-literature: Full Report from the LIFE Project.” LIFE Project, London, UK. 2006. Available at http://eprints.ucl.ac.uk/archive/00001854/01/LifeProjMaster.pdf
Moore, Richard L., Jim D’Aoust, Robert H. McDonald, and David Minor. Disk and Tape Storage Cost Models. San Diego Supercomputer Center, University of California San Diego; La Jolla, CA, USA. 2007. Available at http://users.sdsc.edu/~mcdonald/content/papers/dt_cost.pdf
Morrissey, Sheila. “The Economy of Free and Open Source Software in the Preservation of Digital Artifacts.” Library Hi Tech, Vol. 28 Iss: 2; 2010. Available at http://www.portico.org/digital-preservation/wp-content/uploads/2010/11/The-Economy-of-Free-and-Open-Source-Software-in-the-Preservation-of-Digital-Artifacts.pdf
Oltmans, Erik. “Cost Models in Digital Archiving.” Presentation at LIBER 2004 , Life Cycle Collection Management, St. Petersburg, July 1, 2004. Available at http://liber.library.uu.nl/index.php/lq/article/view/7789/7908
Oltmans, Erik, and Nanda Kol. “A Comparison Between Migration and Emulation in Terms of Costs.” RLG Diginews Volume 9, Number 2; 2005. Available at http://worldcat.org/arcviewer/2/OCC/2009/08/11/H1250012115408/viewer/file2.html
Palaiologk, Anna S., Anastasios A. Economides, Heiko D. Tjalsma and Laurents B. Sesink. “An Activity-based Costing Model for Long-term Preservation and Dissemination of Digital Research Data: the Case of DANS.” International Journal on Digital Libraries, Volume 12, Issue 4, 2012. Available at http://link.springer.com/article/10.1007%2Fs00799-012-0092-1
Palm, Jonas. “The Digital Black Hole.” Riksarkivet/National Archives Sweden. Available at http://www.tape-online.net/docs/Palm_Black_Hole.pdf
Perens, Bruce. “The Emerging Economic Paradigm of Open Source.” First Monday Special Issue #2: Open Source. October 3, 2005. Available at http://www.firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1470/1385
Phillips, Margaret E. “Selective Archiving of Web Resources: A Study of Acquisition Costs at the National Library of Australia.” RLG DigiNews, Volume 9, Number 3. Available at http://www.nla.gov.au/openpublish/index.php/nlasp/article/view/1229
Rosenthal, David. “Modeling the Economics of Long-Term Storage.” DSHR’s Blog. 2011. Available at http://blog.dshr.org/2011/09/modeling-economics-of-long-term-storage.html
Sanett, Shelby. “The Cost to Preserve Authentic Electronic Records in Perpetuity: Comparing Costs across Cost Models and Cost Frameworks.” RLG Diginews, August 15, 2003, Volume 7, Number 4. Available at http://library.oclc.org/cdm/singleitem/collection/p267701coll33/id/366
Sanett, Shelby. “Toward Developing a Framework of Cost Elements for Preserving Authentic Electronic Records into Perpetuity.” College and Research Libraries 63 (5):388-404. 2002. Available at http://crl.acrl.org/content/63/5/388.full.pdf
Slats, Jacqueline and Remco Verdegem. “Cost Model for Digital Preservation.” Nationaal Archief of the Netherlands. 2005. Available at http://dlmforum.typepad.com/Paper_RemcoVerdegem_and_JS_CostModelfordigitalpreservation.pdf
Smith, David M. “The Cost of Lost Data.” Graziadio Business Report, Volume 6, Issue 3: 2003. Available at http://gbr.pepperdine.edu/033/dataloss.html
Strodl, Stephan, and Andreas Rauber. “A Cost Model for Small Scale Automated Digital Preservation Archives.” International Conference on Preservation of Digital Objects 2011. Available at http://www.ifs.tuwien.ac.at/~strodl/paper/strodl_ipres2011_costmodel.pdf
Throsby, David. “Determining the Value of Cultural Goods: How Much (or How Little) Does Contingent Valuation Tell Us?” Journal of Cultural Economics 27: 275–285, 2003. Available at http://culturalheritage.ceistorvergata.it/virtual_library/Art_THROSBY_D-Determining_the_Value_of_Cultural_Goods_-.pdf
Torre, Marta de la, editor. “Assessing the Values of Cultural Heritage: Research Report.” The Getty Conservation Institute; 2002. Available at http://www.getty.edu/conservation/publications_resources/pdf_publications/pdf/assessing.pdf
UC3 Curation Center. “Total Cost of Preservation (TCP): Cost and Price Modeling for Sustainable Services.” 2013. Available at https://wiki.ucop.edu/download/attachments/163610649/TCP-cost-price-modeling-for-sustainable-services-v2_1.pdf?version=4&modificationDate=1375721821000
Walters, Tyler and Katherine Skinner. “Economics, Sustainability, and the Cooperative Model in Digital Preservation.” Library High Tech, Vol. 28, no. 2, 2010. Available at http://www.emeraldinsight.com/journals.htm?articleid=1864753
Wellcome Trust. “Costs and business models in scientific research publishing.” SQW; 2004. Available at http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/wtd003184.pdf
Wellcome Trust. “Economic analysis of scientific research publishing: A report commissioned by the Wellcome Trust.” SQW; 2003. Available at http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/wtd003182.pdf
Wheatley, Paul, P. Ayris, R. Davies, R. Mcleod and H. Shenton. “The LIFE Model v1.1. Discussion paper.” LIFE Project, London, UK. 2007. Available at http://eprints.ucl.ac.uk/4831/1/4831.pdf
Wheatley, Paul and Brian Hole. LIFE3: Predicting Long Term Digital Preservation Costs. LIFE3 Project, London, UK. 2009. Available at http://www.life.ac.uk/3/docs/ipres2009v24.pdf
Wright, Richard, Ant Miller and Matthew Addis. “The Significance of Storage in the “Cost of Risk” of Digital Preservation.” International Journal of Digital Curation, Vol. 4, No. 3; 2009. Available at http://www.ijdc.net/index.php/ijdc/article/view/138
This is a guest post by Abbie Grotke, Library of Congress Web Archiving Team Lead and Co-Chair of the National Digital Stewardship Alliance Content Working Group
You may have seen the news on this blog and elsewhere that the National Digital Stewardship Alliance launched the first ever National Agenda for Digital Stewardship last July. One major section of that document addresses digital content areas. Here’s an excerpt:
Both born‐digital and digitized content present a multitude of challenges to stewards tasked with preservation: the size of data requiring preservation, the selection of content when the totality cannot be preserved, and the selection of modes of both content storage and format migration to ensure long‐term preservation.
Digital stewardship planning must go beyond a focus on content we already have and technology already in use. Even in the near term, a number of trends are evident. Given the ever growing quantity of digital content being produced, scalability is an immediate concern. More and more people globally have access to tools and technologies to create digital content, increasingly with mobile devices equipped with cameras and apps developed specifically for the generation and dissemination of digital content. Moreover, the web continues to be a publishing mechanism for individuals, organizations, and governments, as publishing tools become easier to use. In light of these trends, the question of how to deal with “big data” is a major concern for digital preservation communities.
Selection is increasingly a concern with digital content. With so much data, how do we decide what to preserve? Again, from the agenda:
Content selection policies vary widely depending on the organization and its mission, and when addressing its collections, each organization must discuss and decide upon approaches to many questions. While selection policies for traditional content are most often topically organized, digital content categories, described here, present specific challenges. In the first place, there is the challenge of countering the public expectation that everything digital can be captured and preserved ‐‐ stewards must educate the stakeholders on the necessity of selection. Then there are the general organizational questions that apply to all digital preservation collections. For example, how to determine the long‐term value of content?
Audiences increasingly desire not only access, but enhanced use options and tools for engaging with digital content. Usability is increasingly a fundamental driver of support for preservation, particularly for ongoing monetary support. Which stakeholders should be involved and represented in these determinations? Of the content that is of interest to stakeholders, what is at risk and must be preserved? What are appropriate deselection policies? What editions/versions, expressions and manifestations (e.g. items in different formats) should be selected?
Members of the NDSA’s Content Working Group contributed to 2014 agenda by discussing what content was particularly challenging to them. Report writers then drafted sections of the Agenda to focus on particular challenges with each of the four identified content areas:
- Electronic Records
- Research Data
- Web and Social Media
- Moving Image and Recorded Sound
One simple thing we are doing within the NDSA Content Working Group is holding dedicated meetings focusing on each of the four areas listed above, so that members can learn more and share information about specific challenges, tools in use or being developed and so forth.
The first of these meetings was held December 4, 2013 and focused on web and social media. I provided an overview of web archiving: why web and social media is being archived, who is doing what, what challenges do we face, whether social and ethical, legal, or technical. A PDF of my slides is here. Kris Carpenter from the Internet Archive followed and spoke about the “Challenges of Collecting and Preserving the Social Web.” A PDF of her slides is here.
In January we’ll be focusing on electronic records, and later this spring we’ll have sessions on moving image and recorded sound as well as research data. If you’d like to get in on those conversations, join us in the NDSA!
We don’t claim that the issues surrounding any of the four content types will all be solved over the course of the year, or that these are the only content areas that our members and the broader digital preservation community are dealing with. Who knows what the 2015 Agenda will bring us! But we do hope that by drawing more attention to the challenges we are facing, more research, tools development and related efforts will help advance the work of stewards charged with caring for these digital content areas.
The Library Company of Philadelphia will be hosting Philadelphia’s first National Digital Stewardship Alliance (NDSA) Regional Meeting and Unconference on January 23 and 24. This is part of an initiative across the country for NDSA member organizations to host day-long events, or “NDSA Regional Meetings,” that provide networking and collaboration opportunities for members and highlight the work of regional institutions.
If you’re local to the Philadelphia region or if you’ll be in town for ALA Midwinter, I’d encourage you to check out the program. It’s a free event (!) and there are a few excellent reasons you’ll want to attend.
Learn More About the NDSA
The NDSA is a dynamic organization with more than 150 partner organizations, including universities, government and nonprofit organizations, commercial businesses, and professional associations. It’s self-organized, with the work being decided and driven by professionals contributing work to 5 working groups. The NDSA recently celebrated its third birthday and you can read more about its history and accomplishments here.
At the Philly Regional Meeting, there will be two talks on the NDSA: one on the Levels of Preservation and another on the NDSA and the National Agenda for Digital Stewardship. If you aren’t an NDSA member but you’re interested hearing if the NDSA is a good fit for your organization, please consider attending.
Everyone Wants Standards for Digital Preservation
A focus of this Regional Meeting will be on looking at standards in digital preservation and how different communities use them to preserve and manage their digital collections. The meeting is structured so that you’ll have the opportunity to hear from speakers, like Emily Gore from the Digital Public Library of America (DPLA), Ian Bogus from the University of Pennsylvania Libraries, and George Blood from George Blood Video, on their different approaches to metadata standards used to manage their digital resources (Thursday evening). You’ll also have the opportunity to collaborate during the unconference (Friday morning) on specific challenges or issues on any topic you want to explore with your fellow practitioners in a fun, informal way.
Connect Locally to Your Professional Peers
Creating professional relationships is important, and staying connected to what’s going on in your given field is equally important. NDSA Regional Meetings are particularly great professional development opportunities as a means to connect and network with a local community of practice for digital stewardship. You’ll have the chance to meet face-to-face with your professional peers, ask for advice or help, share ideas and work and generally broaden your knowledge of digital stewardship issues. If your organization is an NDSA member, this is a great time to meet with others in the area. And as I mentioned before, even if your organization isn’t member but you are local to the Philly area, you’re encouraged to attend!
This is the third NDSA Regional Meeting. The Boston Regional Meeting took place in May 2013, organized and hosted by WGBH and Harvard Library. Metropolitan New York Library Council hosted the NYC Regional Meeting last June. Other NDSA member organizations have expressed interest in organizing and hosting regional meetings later in 2014 in other parts of the country (DC-metro area and in the Midwest).
For the Philadelphia Regional Meeting, registration for the unconference on Friday, January 24 is sold out, but there are plenty of spots open for the Thursday, January 23 reception and talks.
Register for #NDSAPhilly today. We’d love to see you there!
Following is a guest blog post from Lisa Shiota, a student at Drexel University School of Information and Library Science and a staff member in the Music Division at the Library of Congress. She explains how she utilized Viewshare in a digital library technologies class.
I am currently finishing classes towards a post-graduate certificate in digital libraries through Drexel University’s online program. This past fall, for my Digital Library Technologies class, our final project was to create a digital library prototype. After looking at several open source applications for digital libraries, I chose Viewshare for my project.
What was particularly appealing to me about Viewshare was the different ways (or “views”) that the information could be presented. I figured it would be worth a try to see how easy it was to use. I requested an account from the moderator using their online form, and once I was approved, I created a login.
Plan and Approach
My plan to build the prototype was fairly simple, at least on paper.
- Identify the physical collection to be used for this project
- Read through the Help pages to learn how to use the system (http://viewshare.org/about/help/)
- Upload smaller test files to see how the system works
- Decide what metadata to record
- Scan covers/title pages
- Upload data
- Build one “view”
- Test interfaces and analyze results
The items I chose for the digital library project are opera scores by Giuseppe Verdi, a 19th-century Italian composer best known for his operas. The Music Division of the Library of Congress, where I am currently working, has most of Verdi’s operas in one print format or another. Although it seemed somewhat limiting to focus on one composer, I wanted to note the contrasting aspects of the collection. For example, most of the items have text in the original language, but there are some that have been translated into other languages. Many of the scores are the first printed editions, but there are several reprints that are represented as well. There are many items that are in manuscript; these are mostly by copyists who had viewed a printed score that had not been available in the United States and had painstakingly made a handwritten copy to add to the library’s collection. These copies are often in extremely brittle condition; many of the handwritten copies and the first printed editions have been copied to microfilm so that a legible, more durable copy could be preserved and made available to library patrons.
After playing around with uploading different kinds of files, I opted to upload a spreadsheet with the items’ metadata. Much of the metadata I chose to compile in my spreadsheet for the items in the digital collection are standard for library bibliographic records: composer, title, publication information, extent (number of pages/volumes), format, language, and call number. I added a couple of fields for internal tracking purposes: a link to the library’s OPAC record where available, and the shelving number for the microfilm version. The notes for each record are mostly mine, which are basic points I found noteworthy about the item.
I scanned the covers (or title pages, in absence of a cover) of the opera scores on a flatbed scanner and saved the images as .jpgs on my personal webspace on my school server. I then added the image URLs to the spreadsheet.
Lastly, I chose to include certain metadata– preferred title, librettists, and performance dates– solely for the purpose of being able to explore the available Viewshare presentations. I wanted to use the preferred title (or uniform title) of a work so that I could group items that were the same work together even if they had different titles on their cover or title pages. I wanted to highlight the names of the original librettists for searching purposes. I recorded the dates of the first performances of the opera (from the “Giuseppe Verdi” entry in Oxford Music Online so that I could experiment with the timeline view.
My final version of my digital library prototype includes List, Table, Timeline, and Gallery views, as well as facets for browsing by score format, language, and librettists, and is publicly available at http://viewshare.org/views/lshiota/verdi-scores/.
This project taught me a lot about the many components involved in creating a digital library. Based on the results of this prototype, I concluded that digitizing the library’s entire opera collection of several hundred items and making them available through Viewshare would prove to be too cumbersome to do. Other smaller collections, such as the division’s archival collections containing short correspondence, sketches, or photographs would work better here. Viewshare’s built-in interfaces for maps, timelines, and graphs would be great for users to interact with the digital collection in a way that they might not be able with the physical collection.
The January 2014 issue of the Library of Congress Digital Preservation Newsletter (pdf) is now available!
Included in this issue:
- Two digital preservation pioneers: Steve Puglia and Gary Marchionini
- New NDSA Report: Staffing for Digital Preservation
- GIS Data at Montana State Library
- Upcoming events: NDSA regional meeting, ALA Midwinter, International Digital Curation Conference
- Interviews with W. Walter Sampson, Mitch Fraas and Cal Lee
- More NDSA news, articles on resources, web archiving and more
To subscribe to the newsletter, sign up here.
It was raining hard on Sunday morning November 17, 2013, as librarian Genna (pronounced “Gina”) Buhr anxiously watched the Weather Channel coverage of the storm system battering central Illinois. Buhr, Public Services Manager at Illinois’ Fondulac District Library, was visiting her parents in Utica, Illinois, about an hour north of Buhr’s home. Her two young children were with her. It had been a relaxed Sunday morning, with everyone lounging in their pajamas, but the increasing severity of the weather gradually changed the mood in the house. At 8:40 a.m., the National Weather Service issued a tornado watch.
Buhr’s parents were as anxious as she was; in 2004, a tornado hit their small valley community, resulting in the death of nine people and 100 damaged or destroyed homes. By mid-morning, Buhr and her parents changed their clothing, put on their shoes and prepared to take shelter in the small basement of the 100-year-old house if the emergency signal sounded. Buhr’s father had designated a secure spot in the old coal room and set aside jugs of emergency water, just in case. They were as ready as they could be.
The tornado formed southeast of East Peoria, not far from Fondulac District Library, and for almost an hour it moved steadily northeast, growing in strength as it traveled. Winds accelerated and peaked at 190 mph as the tornado ravaged the town of Washington, tossing cars, leveling houses and grinding everything in its path to rubble as it plowed on. Eventually, 46 miles away from where it touched down, it weakened and dissipated.
When she got home she saw that her house was safe, with only a few small downed branches, so she left her kids in the care of her mother-in-law (her husband was in Florida) and went to check on the library.
The new Fondulac District Library building celebrated its grand opening to the public only two weeks before the tornado hit the area. The building was deliberately designed to be open, airy and filled with natural light. Three of the four exterior walls are almost completely glass, as is the three-story tower in the center of the building.
When Buhr arrived she was relieved to find her staff OK and the library barely damaged, so she set about almost immediately to mobilize her staff to help the tornado victims. Buhr had first-hand “aftermath” experience helping her family clean up after the 2004 tornado in Utica, and she was inspired by the supportive community spirit — how a lot of volunteers just showed up to help. Similarly, she and the library staff resolved to offer whatever resources it could, beginning with a centralized information resource for the community.
That afternoon she and her staff compiled a web page packed with storm assistance information. They listed emergency phone numbers, phone numbers for utilities, phone numbers for claims divisions of insurance companies and contacts for charitable assistance. Buhr also managed the social media posts that appeared almost instantly after the storm. “In the immediate hours after a disaster, there’s a lot of miscommunication through normal channels,” said Buhr. “If people could contact the library, we’d do our best to get them the answers and information they needed using the resources we had available, including our knowledge of the community and our research skills.” The web page invited people to come to the library to use the electricity, the computers, charge their phones, use the wifi. The library offered the use of a video camcorder so people could document damage. Or people could come in just for comfort. The web page stated, “Visit us to escape the elements – cold, wind, rain – or the stress for a moment or two. Read a book, the newspaper, or a magazine. Play a game. Unwind.”
Heather Evans, a patron of Fondulac District Library, has a special interest in preserving digital photos. Evans contacted Buhr to note that, in a post-disaster cleanup, damaged photos are often overlooked and discarded; Evans suggested that the library might be able to help the community digitize their damaged photos so the electronic copies could be backed up and preserved. Evans even offered to set up her personal copy-stand camera and to digitize photos for those affected by the disaster. Buhr thought it was a terrific, novel idea. “The project fit the features and priorities of the library in a unique way,” said Buhr. “We weren’t collecting water bottles or supplies or anything physical for distribution. Other organizations had that covered. We decided to rely on the skills and talents of our staff and volunteers to offer something equally meaningful and important that maybe other organizations could not.”
While doing some research for the project, Buhr came across Operation Photo Rescue, a 501(c)(3) charity organization of volunteer photography enthusiasts that help rescue and restore damaged photos, particularly after natural disasters. Buhr consulted with OPR’s president Margie Hayes about OPR’s methods, about how Fondulac District Library’s project might work and to ask if OPR would be interested in collaborating. “We don’t have the Photoshop skills that Operation Photo Rescue’s volunteers do,” said Buhr. “We don’t have restoration capabilities here. But it would be a step in the right direction if we could at least get the digitization portion done.”
Within a few days, she had the commitment, the staff and the equipment for the project, which they dubbed Saving Memories. The next step was to get storage media on which members of the community could save their newly digitized photos. Buhr figured that some of the library’s vendors might have flash drives and thumb drives to spare, so she emailed them, explained the Saving Memories project and asked for donations of flash/USB drives. The response was overwhelming. Within days, the Fondulac District Library received more than 2,500 flash/USB drives. The library was ready. Once people had their scans in hand, all that remained to do was backup and care for their digital photos in accordance with the Library of Congress guidelines.
Less than two weeks after the tornado hit, Fondulac District Library set up Evans’ copy-stand camera scanning station and held its first Saving Memories session. To the staff’s disappointment, no one came.
“I did feel it was going to be a little early after the disaster, but it didn’t hurt to try it,” said Buhr. “It’s understandable though. It was a little too soon. People were still being reunited with their items, things that the storm blew away. They were still meeting basic needs, such as housing and transportation.” In fact, in the aftermath of the tornado around central Illinois, more than 1,000 homes were damaged or destroyed, there were 125 injuries and three deaths. So Buhr and her staff understood that the community had more important priorities than scanning photos. The trauma was still fresh and people had bigger concerns. Even Operation Photo Rescue doesn’t go into an affected community right after a disaster. They let peoples’ lives settle down a bit first.
Buhr is not frustrated or deterred. She has more sessions scheduled. She is coordinating with Operation Photo Rescue to hold a large copy run — basically a rescue session — at Washington District Library on February 21 and 22. They will offer further digitization services the following weekend, February 28 and March 1, at Fondulac District Library.
Buhr and her staff are looking beyond Saving Memories’ original goal of helping people salvage and digitize their photos. “We’re regrouping and thinking logistically — and bigger — about how this service can best benefit the community,” she said.
Fondulac District Library hopes to eventually get its own copy-stand camera setup so it can continue to offer a sophisticated photo digitization service. But that raises staffing issues. A qualified staff person -– one trained in photography and the equipment — has to run it, sessions have to be scheduled and someone has to maintain the equipment. Such services need to be thought through carefully. Still, it seems like a logical step in the library’s ongoing service to its community.
“We offer public computers, scanners and copiers,” said Buhr. “Why not also offer the community the use of a copy stand camera scanner?”
Buhr also plans to expand the scope of the project. Fondulac District Library may eventually use the equipment to scan historic photos from the library’s collections. “Part of the attention drawn by the launching of our new library is to our local history collection,” said Buhr. In the old library building, the collection was buried in the basement and not easily accessible. In the new library, the collection is prominently displayed and accessible in the Local History room. Buhr wants to digitize and promote the collection more aggressively.
The actions of Buhr and the staff of Fondulac District Library demonstrate that libraries can help their communities in unexpected ways, including digital preservation and personal digital archiving. Buhr said, “The project is a good match for Fondulac District Library in that –- in response to a disaster –- the project uses the resources and the archival and preservation spirit that libraries have. The project really takes advantage of the broad abilities of the library and the skills of librarians in a unique way. The mission of our Saving Memories project captures the essence of some of the missions of libraries in general — preservation, information and service to the community.”
The following is a guest post from Emily Reynolds, Resident with the World Bank Group Archives
For the next several months, the National Digital Stewardship Residents will be interrupting your regularly-scheduled Signal programming to bring you updates on our projects and the program in general. We’ll be posting on alternate weeks through the end of the residency in May, and we can’t wait to share all of the exciting work we’ve been doing. I’ll start off the series with a quick overview of how it’s been going so far, and what you can expect to hear about in future posts.
After participating in immersion workshops for the first two weeks of September, we’ve been working at our host organizations to tackle their toughest digital stewardship challenges. Our work has been interspersed with group meetings and outings to professional development events; most recently, we heard from NYU’s Howard Besser at an enrichment session for the residents and our mentors. His talk centered around the challenges of preserving user-generated digital content, such as correspondence, email, and the disorderly contents of personal hard drives. The National Security Archive also hosted us for a tour and discussion of their work, where we were able to learn about some of the most prized (and controversial) items in their collection.
A major component of the residency is encouraging and facilitating our attendance at, and participation in, professional conferences. We’ll be presenting twice at ALA Midwinter: a series of lightning talks at the ALCTS Digital Preservation Interest Group meeting, as well as slightly longer presentations at the Library of Congress’s booth. Stay tuned for more information about other conferences that we’ll be participating in, as well as our reports after the fact.
As part of the residency, we’ve been asked to provide updates on our projects on our individual blogs and Twitter accounts. You can follow our Twitter activity on this list, and find links to all of our blogs here . We’ll be coordinating some special features on our personal blogs over the coming months, including interviews with digital preservation practitioners, discussions with each other, and up-close explorations of our institutions and projects; those features will be linked to from our upcoming Signal posts. For now, I’ll leave you with a roundup of some of the NDSR news you might have missed over the past few months:
- Heidi’s answer to the question “so what exactly is Dumbarton Oaks, anyway?”
- Julia’s discussion of the work being done at the National Security Archive
- Lauren’s collection of resources related to media preservation
- Jaime’s theory that William Shakespeare would have been a web archivist
Conferences and events:
The following is a guest post from Lee Nilsson, a National Digital Stewardship Resident working with the Repository Development Center at The Library of Congress.
The 2014 National Agenda for Digital Stewardship makes a clear-cut case for the development of File Format Action Plans to combat format obsolescence issues. “Now that stewardship organizations are amassing large collections of digital materials,” the report says, “it is important to shift from more abstract considerations about file format obsolescence to develop actionable strategies for monitoring and mining information about the heterogeneous digital files the organizations are managing.” The report goes on to detail the need for organizations to better “itemize and assess” the content they manage.
Just what exactly is a File Format Action Plan? What does it look like? What does it do? As the new National Digital Stewardship Resident I undertook an informal survey of a selection of divisions at the library. Opinions varied as to what should constitute a file format action plan, but the common theme was the idea of “a pathway.” As one curator put it, “We just got in X. When you have X, here’s the steps you need to take. Here are the tools currently available. Here is the person you need to go to.”
For the dedicated digital curator, there are many different repositories of information about the technical details of digital formats. The Library of Congress’ excellent Sustainability of Digital Formats page goes into exhaustive detail about dozens of different file format types. The National Archives of the UK’s now ubiquitous PRONOM technical registry is an indispensable resource. That said; specific file format action plans are not very common.
Probably the best example of File Format Action Plans in practice is provided by the Florida Digital Archive. The FDA attempted to create a plan for each type of file format they preserve digitally. The result is a list of twenty-one digital formats, ranked by “confidence” as high, medium, or low for their long term storage prospects. Attached to each is a short Action Plan giving basic information about what to do with the file at ingest, its significant properties, a long term preservation strategy, and timetables for short-term actions and review. Below that is a more technically detailed “background report” explaining the rational behind each decision. Some of the action plans are incomplete, recommending migration to a yet unspecified format at some point in the future. The plans have not been updated in some time, with many stating that they are “currently under discussion and subject to change.”
A related project was undertaken by the University of Michigan’s institutional repository, which organizes file formats into three specific targeted support levels.
Clicking on “best practices” for a format type (such as the above for audio formats) will take you to a page detailing more specific preservation actions and recommendations. This design is elegant and simple to understand, yet it is lacking in much detailed information about the formats themselves.
An even more broad approach was done by the National Library of Australia. The NLA encourages its collection curators to make, “explicit statements about which collection materials, and which copies of collection materials, need to remain accessible for an extended period, and which ones can be discarded when no longer in use or when access to them becomes troublesome.” They call these outlines “Preservation Intent Statements.” Each statement outlines the goals and issues unique to each library division. This Preservation Intent Statement for NLA’s newspaper digitization project, goes into the details of what they intend to save, in what format, and what preservation issues to expect. This very top-down approach does not go into great detail about file formats themselves, but it may be useful in clarifying just what the mission of a curatorial division is, as well as providing some basic guidance.
There have been notable critics of the process of the idea of file format action plans based on risk assessment. Johan van der Knijff on the Open Planets Foundation blog compared the process of assessing file format risks to “Searching for Bigfoot,” in that these activities always rest on a theoretical framework, and that scarce resources could be better spent solving problems that do not require any soothsaying or educated guesswork. Tim Gollins of the National Archives of the UK argues that while it might be true that digital obsolescence issues are real in some cases, resources may better be spent addressing the more basic needs of capture and storage.
While taking those critiques seriously, it may be wise to take a longer view. It is valuable to develop a way to think about and frame these issues going forward. Sometimes getting something on paper is a necessary first step, even if it is destined to be revised again and again. Based on my discussions with curators at the Library of Congress, a format action plan could be more than just an “analysis of risk.” It could contain actionable information about software and formats which could be a major resource for the busy data manager. In a sprawling and complex organization like the Library of Congress, getting everyone on the same page is often impossible, but maybe we can get everyone on the same chapter with regards to digital formats.
Over the next six months I’ll be taking a look at some of these issues for the Office of Strategic Initiatives at the Library. As a relative novice to the world of library issues, I have been welcomed by the friendly and accommodating professionals here at the library. I hope to get to know more of the fascinating people working in the digital preservation community as the project progresses.
The end of the year is a great time to take stock. I’m currently in the “have I done irrevocable damage to my body during the holiday snacking season” phase of stock-taking. Luckily, the National Digital Stewardship Alliance isn’t concerned with whether anyone’s going to eat that last cookie and has a higher purpose than deciding whether the pants still fit.
The NDSA was launched in July 2010 but really got going with the organizing workshop held December 15-16, 2010 here in D.C., which makes this December the roughly 3-year anniversary of the start of its work. The workshop refined the NDSA’s mission “to establish, maintain, and advance the capacity to preserve our nation’s digital resources for the benefit of present and future generations” and also established the NDSA organizational structure of 5 working groups with a guiding coordinating committee.
It didn’t take long for the working groups to self-organize and tackle some of the most pressing digital stewardship issues over the first couple of years. The Infrastructure working group released the results from the first Preservation Storage survey and is currently working on a follow-up. The Outreach group released the Digital Preservation in a Box set of resources that provide a gentle introduction to digital stewardship concepots (note to LIS educators: the Box makes a great tool for introducing digital stewardship to your students. Get in touch to see how the NDSA can work with you on lesson plans and more).
The Innovation working group coordinated two sets of NDSA Innovation award winners, recognizing “Individual,” “Project,” “Institution” and “Future steward” categories of superior work in digital stewardship, while the Content working group organized “content teams” around topic areas such as “news, media and journalism and “arts and humanities” to dive more deeply into the issues around preserving digital content. This work lead to the release of the first Web Archiving survey in 2012, with the second underway. The Geospatial Content Team also released the “Issues in the Appraisal and Selection of Geospatial Data” report (pdf) in late 2013.
The NDSA has also worked to inform the digital stewardship community and highlight impressive work with an expanding series of webinars and through the Insights and Content Matters interview series on the Signal blog.
And not least, the “2014 National Agenda for Digital Stewardship” integrated the perspective of NDSA experts to provide funders and executive decision-makers insight into emerging technological trends, gaps in digital stewardship capacity and key areas for funding, research and development to ensure that today’s valuable digital content remains accessible and comprehensible in the future.
Over the coming year, the NDSA will expand its constituent services, working to integrate its rapidly expanding partner network into the rich variety of NDSA activities. The NDSA will also expand its interpersonal outreach activities through broad representation at library, archive and museum conferences and by engaging with partners in a series of regional meetings that will help build digital stewardship community, awareness and activity at the local level.
The next NDSA regional meeting is happening in Philadelphia on Thursday January 23 and Friday January 24, hosted by the Library Company of Philadelphia. We’re also in the early planning stages of a meeting in the Midwest to leverage the work of NDSA partner the Northern Illinois University Library and their POWRR project.
Look for more blog posts in 2014 that provide further guidance on the Levels of Preservation activity. The Dec. 24 post starts working through the cells on each of the levels, with an opening salvo addressing data storage and geographic location issues.
The NDSA has also published a series of reports over the past year, including the “Staffing for Effective Digital Preservation” report from the Standards and Practices working group. Look for a new report early in 2014 on the issues around the release of the PDF/A-3 specification and its benefits and risks for archival institutions.
The NDSA can look back confidently over the past three years to a record of accomplishment. It hasn’t always been easy; it’s not easy for any volunteer-driven organization to accomplish its goals in an era of diminishing resources. But the NDSA has important work to do and the committed membership to make it happen.
And like the NDSA, I’m looking forward to a healthier, happier 2014, putting those cookies in the rear-view mirror and hoping the pants will eventually fit again.
Curiously, most of us in the digital memory business are hesitant to visually document our own work. Possibly this has to do with the perceived nature of the enterprise, which involves tasks that may seem routine. But pictures tell an important story, and I set about finding a few that depicted some of the digital preservation focal points for the past year.
I did a Flickr search for the words “digital” and “preservation” and limited the results to photos taken in 2013. I also limited the results to “only search within Creative Commons-licensed content” and “find content to modify, adapt, or build upon.” There were 2 to 3 dozen results. While most fell into a couple of common categories, I was pleased to find 11 that struck me as especially engaging, unusual or otherwise interesting.
And while digitization is only a first step in digital preservation, I included a couple of shots that depict digital reformatting activities.
The NDSA levels of digital preservation are useful in providing a high-level, at-a-glance overview of tiered guidance for planning for digital preservation. One of the most common requests received by the NDSA group working on this is that we provide more in-depth information on the issues discussed in each cell.
To that end, we are excited to start a new series of posts, set up to help you and your organization think through how to go about working your way through the cells on each level.
There are 20 cells in the five levels, so there much to discuss. We intend to work our way through each cell while expounding on the issues inherent in that level. We will define some terms, identify key considerations and point to some secondary resources. If you want an overall explanation of the levels, take a look at The NDSA Levels of Digital Preservation: An Explanation and Uses.
Let’s start with row one cell one, Protect Your Data: Storage and Geographic Location.
The Two Requirements of Row One Column One
There are only two requirements in the first cell, but there is actually a good bit of practical logic tucked away inside the reasoning for those two requirements.
Two complete copies that are not collocated
For starters you want to have more than one copy and you want to have those two copies in different places. The difference between having a single point of failure and two points of failure is huge. For someone working at a small house museum that has a set of digital recordings of oral history interviews this might be as simple as making a second copy of all of the recordings on an external hard drive and taking that drive home and tucking it away somewhere. If you only have one copy, you are one spilt cup of coffee, one dropped drive, or one massive power surge or fire away from having no copies. While you could meet this requirement literally by simply making any type of copy of your data and taking it home, it will become clear that this alone is not going to be a tenable solution for you to make it further up the levels in the long run. The point of the levels is to start somewhere and make progress.
With this said, it’s important to note that all storage media is not created equally. The difference in error rates between something like a flash drive on your key chain, to an enterprise hard disk or tape is gigantic. So gigantic in fact that from error rate alone, you would likely be better off only having one copy on a far better quality piece of media than having two copies on something like two cheap flash drives. Remember though, the hard error rate of the storage devices is not the only factor you should be worried about. In many cases, human error is likely to be the biggest factor that would result in data loss, particularly when you have a small (or no) system in place.
“Complete” copies are an important factor here. Defining “completeness” is something worth thinking through. For example, a “complete copy” may be defined in terms of the integrity of the digital file or files that make up your source and your target. At the most basic level, when you make copies you want to do a quick check to make sure that the file size or sizes in the copy are the same as the size of the original files. Ideally, you would run a fixity check, comparing for instance the MD5 hash value for all the first copies with the MD5 hash value of the second copies. The important point here is that “trying” to make a copy is not the same thing as actually having succeeded in making a copy. You are going to want to be sure you do at least a spot check to make sure that you really have created an accurate copy.
For data on heterogeneous media (optical discs, hard drives, etc.) get the content off the media and into your storage system
A recording artist ships a box full of CDs and hard disks to their label for production of their next release. A famous writer offers an archive her personal papers and includes two of her old laptops, a handful of 5.25 inch floppies, and a few keychain quality flash drives. An organization’s records management division is given a crate full of rewritable CDs from the accounting department. In each of these cases, a set of heterogeneous digital media have ended up on the doorstep of a steward often with little or no preliminary communications. Getting the bits off that media is a critical first step. None of these methods of storage are intended for long term; in many cases things like flash drives and rewritable CDs are not intended to function, even in optimal conditions, for more than a few years.
So, get the bits off their original media. But where exactly are you supposed to put them? The requirement in this cell suggests you should put them in your “storage system.” But what exactly is that supposed to mean? It’s intentionally vague in this chart in order to account for different types of organizations, resource levels and overall departmental goals. With that said the general idea is that you want to focus on good quality media (designed for longer rather than shorter life), for example “enterprise quality” spinning disk or magnetic tape (or some combination of the two), and a way of managing what you have. For the first cell here, the focus is on the quality of the media. However, as requirements move further along it is going to become increasingly important to be able to be able to check and validate your data. Thus easy ways to manage the data on all of your copies becomes a critical component of your storage strategy. For example, a library of “good” quality CDs could serve as a kind of storage system. However, managing all of those pieces of individual media would itself become a threat to maintaining access to that content. In addition, when you inevitably need to migrate forward to future media, the need to individually transfer everything off of that collection of CDs would become a significant bottleneck for being able to move to future media. In short, the design and architecture of your storage system is a whole other problem space, one not really directly covered by the NDSA Levels of Digital Preservation.
The NDSA Levels of Digital Preservation: An Explanation and Uses Megan Phillips, Jefferson Bailey, Andrea Goethals, Trevor Owens
How Long Will Digital Storage Media Last? Personal Digital Archiving Series from The Library of Congress
And if so, why would you ever want to? About a year ago the University of Iowa Libraries Special Collections announced a rather exciting project, to digitize the data tapes from the Explorer I satellite mission. My first thought: the data on these tapes is digital to begin with, so there’s not really something to digitize here. They explain, the plan is to “digitize the data from the Explorer I tapes and make it freely accessible online in its original raw format, to allow researchers or any interested parties to download the full data set. “ It might seem like a minor point for a stickler for vocabulary, but that sounds like transferring or migrating data from its original storage media to new media.
To clarify, I’m not trying to be a pedant here. What they are saying is clear and it makes sense. With that said, I think there are actually some meaningful issues to unpack here about the difference between digital preservation and digitization and reading, encoding and registering digital information.
Digitization involves taking digital readings of physical artifacts
In digitization, one uses some mechanism to create a bitstream, a representation of some set of features of a physical object in a sequence of ones and zeros. In this respect, digitization is always about the creation of a new digital object. The new digital object registers some features of the physical object. For example, a digital camera registers a specific range of color values and a specific but limited numbers of dots per square inch. Digital audio and video recorders capture streams of discrete numerical readings of changes in air pressure (sound) and discrete numerical readings of chroma and luminance values over time. In short, digitization involves taking readings of some set of features of an artifact.
Reading bits off old media is not digitization
Taking the description of the data tapes from the Explorer I mission, it sounds like this particular project is migrating data. That would mean reading the sequence of bits off their original media and then make them accessible. On one level it makes sense to call this digitization, the results are digital and the general objective of digitization projects is to make materials more broadly accessible. Moving the bits off their original media and into an online networked environment feels the same, but it has some important differences. If we have access to the raw data from those tapes we are not accessing some kind of digital surrogate, or some representation of features of the data, we would actually be working with the original. The alographic nature of digital objects, means working with a bit for bit copy of the data is exactly the same as working with the bits encoded on their original media. With this noted, perhaps most interestingly, there are times when one does want to actually digitize a digital object.
When we do digitize digital objects
In most contexts of working with digital records and media for long term preservation, one uses hardware and software to get access to and acquire the bitstream encoded on the storage media. With that said, there are particular cases where you don’t want to do that. In cases where parts of the storage media are illegible, or where there are issues with getting the software in a particular storage device to read the bits off the media there are approaches that bypass a storage devices interpretation of it’s own bits and instead resort to registering readings of the storage media itself. For example, a tool like Kryoflux can create a disk image of a floppy disk that is considerably larger in file size than the actual contents of the disk. In this case, the tool is actually digitizing the contents of a floppy disk. It stops treating the bits on the disk as digital information and shifts to record readings of the magnetic flux transition timing on the media itself. The result is a new digital object, one from which you can then work to interpret or reconstruct the original bitstream from the recordings of the physical traces of those bits you have digitized.
So when is and isn’t it digitization?
So, it’s digitization whenever you take digital readings of features of a physical artifact. If you have a bit for bit copy of something, you have migrated or transferred the bitstreams to new media but you haven’t digitized them. With that said, there are indeed times when you want to take digital readings of features of the actual analog media on which a set of digital objects are encoded. That is a situation in which you would be digitizing a set of features of the analog media on which digital objects reside. What do you think? Is this helpful clarification? Do you agree with how I’ve hashed this out?
After reading a great post by the Smithsonian Institution Archives on Archiving Family Traditions, I started thinking about my own activities as a steward of my and my family’s digital life.
I give myself a “C” at best.
Now, I am not a bad steward of my own digital life. I make sure there are multiple copies of my files in multiple locations and on multiple media types. I have downloaded snapshots of websites. I have images of some recent important texts. I download copies of email from a cloud service into an offline location in a form that, so far, I have been able to read and migrate across hardware. I have passwords to online accounts documented in a single location that I was able to take advantage of when I had an sudden loss.
I certainly make sure my family is educated about digital preservation and preservation in general, to the point that I think (know?) they are sick of hearing about it. I have begun a concerted but slow effort to scan all the family photos in my possession and make them available with whatever identifying metadata (people, place, date) that I gathered from other family members, some of whom have since passed away. I likely will need to crowdsource some information from my family about other photos.
But I am not actively archiving our traditions. I often forget to take digital photos at events, or record metadata when I do take them. I have never collected any oral histories. I have not recorded my own memories. I do have some of my mother’s recipes (and cooking gear) and I need to make sure that these are documented for future generations. I have other items that belonged to my mother and grandmother that I also need to more fully document so others know their provenance and importance. And then I need to make sure all my digital documentation is distributed and preserved.
I asked some friends what they were doing, and got some great answers. One is creating a December Daily scrapbook documenting the activities of the month. One has been documenting the holiday food she prepares and family recipes for decades, in both physical and digital form. One has been making a photobook of the year for every year since her children were born, and plans to create a book of family recipes. Another has been recording family oral histories, recording an annual family religious service for over 20 years, and is digitizing family photos that date back as far as the 1860s.
How are you documenting and archiving your family’s traditions, whether physical or digital? And preserving that documentation?
The humble bloggers who toil on behalf of The Signal strive to tell stimulating stories about digital stewardship. This is unusual labor. It blends passion for a rapidly evolving subject with exacting choices about what to focus on.
Collecting, preserving and making available digital resources is driving enormous change, and the pace is so fast and the scope so broad that writing about it is like drinking from the proverbial firehose.
Back when The Signal was a mere eye gleam, institutional gatekeepers were, as is their wont, skeptical. “Can you make digital preservation interesting?” They asked. “Is there enough to write about? Will anyone care?”
While we responded with a bureaucratic version of “yes, of course!” to each question, we had to go prove it. Which, after many months and hundreds of posts, I think we have done.
I attribute success to stories that have meaning in the lives of our readers, most of whom care deeply about digital cultural heritage. As noted, that topic is as diverse as it is dynamic. A good way to gauge this is to consider the range of posts that were the most popular on the blog for the year. So here, ranked by page views based on the most current data, are our top 14 posts of 2013 (out of 257 total posts).
- 71 Digital Portals to State History
- You Say You Want a Resolution: How Much DPI/PPI is Too Much?
- Is JPEG-2000 A Preservation Risk?
- Scanning: DIY or Outsource
- Snow Byte and the Seven Formats: A Digital Preservation Fairy Tale
- Social Media Networks Stripping Data from Your Digital Photos
- Fifty Digital Preservation Activities You Can Do
- Announcing a Free “Perspectives on Personal Digital Archiving” Publication
- Top 10 Digital Preservation Developments of 2012
- Analysis of Current Digital Preservation Policies: Archives, Libraries and Museums
- The Metadata Games Crowdsourcing Toolset for Libraries & Archives: An Interview with Mary Flanagan
- Doug Boyd and the Power of Digital Oral History in the 21st Century
- Moving on Up: Web Archives Collection Has a New Presentation Home
- Anatomy of a Web Archive
Special bonus: Page views are only one way to measure top-of-the-yearness. In the blogging world, comments are also important, as they indicate the degree to which readers engage with a post. By that measure, the top 14 posts of 2013 are slightly different.
- 71 Digital Portals to State History (51 comments)
- Snow Byte and the Seven Formats: A Digital Preservation Fairy Tale (21 comments)
- Is JPEG-2000 A Preservation Risk? (17 comments)
- 39 And Counting: Digital Portals to Local Community History (16 comments)
- Social Media Networks Stripping Data from Your Digital Photos (14 comments)
- You Say You Want a Resolution: How Much DPI/PPI is Too Much? (13 comments)
- What Would You Call the Last Row of the NDSA Levels of Digital Preservation? (12 comments)
- CURATEcamp Exhibition: Exhibition in and of the Digital Age (11 comments)
- Word Processing: The Enduring Killer App (10 comments)
- Older Personal Computers Aging Like Vintage Wine (if They Dodged the Landfill) (10 comments)
- Scanning: DIY or Outsource (10 comments)
- Where is the Applied Digital Preservation Research? (8 comments)
- The “Spherical Mercator” of Time: Incorporating History in Digital Maps (8 comments)
- Opportunity Knocks: Library of Congress Invites No-cost Digitization Proposals (7 comments)
Thank you to all our readers, and most especially to our commenters.
Steven Puglia, manager of Digital Conversion Services at the Library of Congress, died peacefully on December 10, 2013 after a year-long battle with pancreatic cancer. Puglia had a profound effect on his colleagues here in Washington and worldwide, and there is a great outpouring of grief and appreciation in the wake of his passing.
The testimony embedded in this tribute demonstrates that Steve’s passing left the cultural heritage, conservation and preservation communities stunned, somber and affectionate. Their words attest to his character, his influence and the significance of his work. He was a rare combination of subject-matter expert and gifted, masterful teacher, who captivated and inspired audiences.
“Generous” is a word colleagues consistently use to describe Puglia – generous with his time, energy, advice and expertise. He was a pleasure to be around, the kind of colleague you want in the trenches with you – compassionate, kind and brilliant, with a wry sense of humor.
Steve enjoyed sharing his knowledge and helping others understand. From International Standards groups to workshops, from guidelines to desk-side help for colleagues, Steve sought out opportunities to teach. During discussions of how detailed to get in the Guidelines, Steve would often remind us that digitization is, by its nature, a technical endeavor…He worked even harder to make it palatable for those who simply hadn’t gotten it yet. — Jeff Reed, National Archives and Records Administration and co-author with Steve Puglia and Erin Rhodes of the 2004 Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files
Photography defined Puglia’s life — both the act of photography and the preservation and access of photographs. It was at the root of his work even as his professional life grew and branched in archival, preservationist and technological directions.
He earned a BFA in Photography from the Rochester Institute of Technology in 1984 and worked at the Northeast Document Conservation Center duplicating historic negatives. In 1988, Puglia earned an MFA in Photography from the University of Delaware and went to work for the National Archives and Records Administration’s reformatting labs as a preservation and imaging specialist.
At NARA, Puglia worked with microfilm, storage of photographs and establishing standards for negative duplication. With the advent of the digital age, Puglia set up NARA’s first digital imaging department and researched the impact of digital technology on the long-term preservation of scanned images. He was instrumental in developing new methods of digital image preservation and helping to set imaging standards.
I feel very fortunate and thankful that I had the opportunity to work alongside Steve and to learn so much from him; Steve was a smart, inquisitive, kind, generous colleague, but even more so, he was an amazing teacher. He was generous in sharing his vast knowledge of digitization as well as traditional photographic processes and concepts – and the intersection of the two – in the work that we were doing at NARA.
I think writing the Guidelines was a labor of love for all of us, but especially for Steve. We collectively worried about how they would be perceived, how they would be useful, and about all the small details of the document. I remember especially struggling and working on the Image Parameter tables for different document types, all of us knowing these would probably be the most consulted part of the Guidelines. The fact that these tables are still relevant and stand strong today is a testament to Steve’s knowledge and contributions to the field. I feel lucky that I had a chance to learn from Steve; he was my first real mentor. We should all feel lucky to benefit from his knowledge. He will be missed. — Erin Rhodes, Colby College and co-author with Steve Puglia and Jeff Reed of the 2004 Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files
In 2011, Puglia joined the Library of Congress as manager of Digital Conversion Services where he oversaw the research and development of digital imaging approaches, data management, development of tools and other technical support in the Digital Imaging Lab.
It was not his first time working with the Library. In 1991 and 1992 he collaborated with the Preservation Directorate and over the past several years he had been a major contributor to the Federal Agencies Digitization Guidelines Initiative. He became chair of the FADGI Still Image Working Group; in August 2011, he posted an update about the Still Image Working Group on The Signal.
Steve was a driving force in creating guidelines to help steer cultural heritage institutions towards standardized methods for digitizing their treasures. While at NARA, he was the primary author of the Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files, the 2004 document that continues to serve as a teaching tool and reference for all those involved in digital imaging. In 2007, Steve extended his efforts to form the FADGI Still Images Working Group and participated as a key technical member, providing invaluable input on practically every aspect of imaging technique and workflow.
I chaired the group from its start through 2010, and I could not have accomplished half of what I did without Steve. When I was at a loss as to how to best proceed, Steve provided the guidance I needed. He was one of the most genuine and honorable individuals I have known. Steve was selfless in giving his time to anyone who needed assistance or advice, and he will be missed by those who knew him. His passing is a tremendous loss to the cultural heritage imaging community. — Michael Stelmach, former Digital Conversion manager at the Library and past FADGI coordinator.
In reading Puglia’s June, 2011, Signal blog post about the JPEG 2000 Summit, you get a sense of his excitement for his work and a taste of how well he can communicate a complex subject in simple language.
This aspect of Puglia’s character comes up repeatedly: his drive to make his work clearly understood by anyone and everyone. In Sue Manus’s blog post introducing Puglia to readers of The Signal, she writes, “He says the next steps include working to make the technical concepts behind these tools better understood by less technical audiences, along with further development of the tools so they are easier to work with and more suited to process monitoring and quality management.” And “From an educational perspective, he says it’s important to take what is learned about best practices and present the concepts and information in ways that help people understand better how to use available technology.”
Colleagues declare that Puglia was a key figure in setting standards and guidelines. They report that he led the digital-preservation profession forward and he made critical contributions to the cultural heritage community. They praise his foresight and his broad comprehension of technology, archives, library science, digital imaging and digital preservation, all tempered by his practicality. And they all agree that the impact of his work will resonate for a long time.
Sometimes the best discussions–the ones you really learn from–are conversations in which the participants express different ideas and then sort them out. It’s like the college dorm debates that can make the lounge more instructive than a classroom. Over the years, I learned from Steve in exchanges leavened with friendly contrariety. For example, in 2003, we were both on the program at the NARA preservation conference. I was helping plan the new Library of Congress audiovisual facility to be built in Culpeper, Virginia, and my talk firmly pressed the idea that the time had come for the digital reformatting of audio and video, time to set aside analog approaches. Steve’s presentation was about the field in a more general way and it was much more cautious, rich with reminders about the uncertainties and high costs that surrounded digital technologies, as they were revealed to us more than a decade ago.
In the years that followed, our small tug of war continued and I saw that Steve’s skepticism represented the conservatism that any preservation specialist ought to employ. I came to think of him as a digital Descartes, applying the great philosopher’s seventeenth century method of doubt to twenty-first century issues. And like Descartes, Steve mustered the best and newest parts of science (here: imaging science) to build a coherent and comprehensive digital practice.
He may have been a slightly reluctant digital preservation pioneer but without doubt he was a tremendous contributor whose passing is a great loss to friends and colleagues. — Carl Fleischhauer, Library of Congress digital format specialist and FADGI coordinator
Puglia’s ashes will be scattered in New Hampshire along a woodland brook that he loved. A fitting end for a photographer.
A few weeks ago, as part of the Aligning National Approaches to Digital Preservation conference, an announcement was made of the beta launch of a new resource to catalog and describe digital preservation tools: Community Owned digital Preservation Tool Registry.
The idea behind this registry is to try and consolidate all of the digital preservation tool resources into one place, eliminating the need for many separate registries in multiple organizations.
As an example of how this will be useful, at NDIIPP we have our own tools page that we have maintained over the years. Many of the tools on this list have either been produced by the Library of Congress or our NDSA partners –with the overall aim to provide these tools to the wider digital preservation community. Of course, the tools themselves, or the links, change on a fairly regular basis; they are either updated or just replaced altogether. And, as our list has grown, there is also the possibility of duplication with other such lists or registries that are being produced elsewhere. We have provided this to our users as an overall resource, but the downside is, it requires regular maintenance. For now, our tools page is still available, but we have currently put any updates on hold in anticipation of switching over to COPTR.
COPTR is meant to resolve such issues of duplication and maintenance, and to maintain a more centralized, up-to-date, one-stop shop for all digital preservation related tools.
For ease of use, COPTR is presented on a wiki – anyone has access to this in order to add tools to the registry or to edit and update existing ones. Here’s how it’s described by Paul Wheatley, one of the original developers of this effort:
“The registry aims to support practitioners in finding the tools they need to solve digital preservation problems, while reducing the glut of existing registries that currently exacerbate rather than solve the challenge. (I’ve blogged in detail about this.)
COPTR has collated the contents of five existing tool registries to create a greater coverage and depth of detail that has to date been unavailable elsewhere. The following organisations have partnered with COPTR and contributed data from their own registries: The National Digital Stewardship Alliance, The Digital Curation Centre (DCC), The Digital Curation Exchange (DCE), The Digital POWRR Project, The Open Planets Foundation (OPF)”
The above organizational list is not meant to be final, however. Wheatley emphasizes that they are looking for other organizations to participate in COPTR and to share their own tool registries.
On the wiki itself, the included tools are grouped into “Tools by Function” (disk imaging, personal archiving, etc.) or “Tools by Content” (audio, email, spreadsheet, etc.) According to the COPTR documentation, specific information for each tool will include the description and specific function, relevant URLs to the tool or resources and any user experiences. Generally, the tools to be included will be anything in the realm of digital preservation itself, such as those performing functions described in the OAIS model or in a digital lifecycle model. More specifically, the COPTR site describes in-scope vs. out-of-scope as the following:
- In scope: characterisation, visualisation, rendering, migration, storage, fixity, access, delivery, search, web archiving, open source software ->everything inbetween<- commercial software.
- Out of scope: digitisation, file creation
According to Wheatley, the goal is for organizations to eventually close their own registries and instead reference COPTR. The availability of a datafeed from COPTR provides a useful way of exposing COPTR (or subsets of the COPTR data) on their own sites.
This overall goal may sound ambitious, but it’s ultimately very pragmatic: to create a community-built resource that is accurate, comprehensive, up-to-date and eliminates duplication.
COPTR Needs You! To make this effort a success, the organizers are asking for some help:
- Add tools to the list (see the guide here)
- Give feedback
- Promote COPTR
- Consider bringing your organization into partnership with COPTR
- See this “to do” list to help develop COPTR even further.
And feel free to contribute feedback in the comment section of this blog post, below.
COPTR is a community registry that is owned by the community, for the community. It is supported by Aligning National Approaches to Digital Preservation , The Open Planets Foundation , The National Digital Stewardship Alliance, The Digital Curation Centre , The Digital Curation Exchange and the Digital POWRR Project.