The Signal: Digital Preservation
Alongside this year’s Digital Preservation 2013 meeting, I am excited to announce that we will also be playing host to a CURATEcamp unconference focused on exploring the idea of exhibition. For those unfamiliar with unconferences, the key idea is that the participants define the agenda and that there are no spectators, everyone who comes should plan on actively participating in and helping to lead discussions. Everybody who participates should come ready to work.
An exhibition involves organizing, contextualizing and displaying collection items. As cultural heritage organizations increasingly make both digitized and born-digital materials available, we find a range of opportunities for exhibiting them. Thinking broadly about the idea of exhibition, everything from faceted browsing and visualizations to linear and non-linear modes of presenting materials, is part of the interpretive framework through which users make sense of collection materials.
This CURATEcamp unconference offers an opportunity for curators, archivists, librarians, scholars, software developers, computer engineers and others to share, demonstrate and refine ideas about exhibition in the digital age.
I am excited to co-facilitate this unconference with Sharon Leon, director of public projects at the Roy Rosenzweig Center for History and New Media, and Michael Edson, director of web and new media strategy at the Smithsonian Institution.
When: July 25, 2013
Where: Alexandria, VA
Register: You can register for the meeting from the Digital Preservation conference registration page. Note that the CURATEcamp is limited to the first 100 registrants.
Potential Session Topics include:
- Open Authority and Curatorial Voice
- Online Exhibition at Scale
- Visualization as Exhibition
- Exhibiting Born Digital Objects
- Interpretation for Mobile Devices
- Digital Storytelling and Cultural Heritage Collections
- Collection Interfaces that Contextualize
- Storytelling and Linked Data
- Social Media as Exhibition
- Citizen Curators
- Blogs as Serialized Exhibits
- Data Journalism as inspiration for Exhibition
On May 20-21, 2013, the Library of Congress hosted one in its series of small invitational digital content at-risk summits, this one on the topic of software preservation. “Preserving.exe: Toward a National Strategy for Preserving Software” covered a wide range of topics around software preservation, every type of software and interactive media art and engaged multiple communities from software creators to curators. Details on the meeting are here.
While there will be later posts and detailed reports from the meeting, there was one topic that I kept considering and that came up at two other conferences that I attended this week: hardware preservation. I have long been part of the camp that was in favor of building collections of hardware, in part because I have seen the necessity of doing so for audio and video collections, where such hardware is vital for the replay of media for researchers or for digitization. While I have seen successful emulation projects, it seemed like a dream that we could potentially build so many emulators.
This week I had my mind changed.
It was really brought home for me the astonishing extent of hardware and lower level software infrastructure we would have to locate, restore, and keep running to run the application software needed to provide access to content files in our collections. It is a daunting task, and colleagues at institutions where they do collect hardware provided a reality check at this meeting as to what it takes.
I saw or heard about some exceptionally successful emulation projects this week. We were given a brief sneak peek at pilot for the Olive Executable Archive from Carnegie Mellon University, and were witness to fully playable Virtual Machines of games. The Multiple Arcade Machine Emulator is so successful a project, that, after 10 years, they have a short list of the games they cannot emulate. New York Public Library has been testing interactive visualizations of theatrical lighting design that run using files that are part of their Theatrical Lighting Database. The emscripten project provides a robust framework for emulation in the browser.
For some more recent and common environments–as well as common media format readers–we will almost certainly need to keep hardware running in our organizations to assist us in making preservation copies of media and files that we receive as part of our collection building. And we will need to provide such hardware in our reading rooms. There is a need. But I was convinced this week that emulation may serve our needs better than hardware, except for the need to read the media in our collections to preserve their content.
We cannot all become museums of computer hardware. There are wonderful organizations like the Computer History Museum, the National Museum of Computing, the Heinz Nixdorf MuseumsForum, and the Centre for Computing History that serve that purpose well. And none of this diminishes my feeling that hardware should be collected and preserved. These are artifacts from our computing history. They are examples from the history of industrial design. They are part of my (and other’s) personal histories. When we display vintage hardware and media at our personal digital archiving events, they always attract a crowd and elicit many personal stories that help us engage with visitors about the management of their digital files. For all of those reasons I will continue to be an advocate for hardware preservation, but with a different endgame than I had in mind at the beginning of the week.
The following is a guest post by Tess Webre, former intern with NDIIPP at the Library of Congress
For the past semester I have been working with NDIIPP learning the tools of the trade, creating resources, and crafting fun blog posts (or at least trying). Sad to say, the semester is over. Yes, loyal readers, it is time for me to doff my hat and hit the old dusty trail. Yet, it wouldn’t be a complete ride-off-into-the-sunset-moment without an impassioned speech that tied the whole story together, so here goes.
Tess’s long and dramatic speech on the importance of digital stewardship:
Our records are important. It’s a simple statement, but a point worth delving into. Records are important from a financial perspective (such as taxes, purchases, etc.), from a legal perspective (proof of ownership, deeds, etc.), from a historic perspective and familial perspective (genealogy, general warm and fuzziness). However, there is another aspect to the necessity of records. They foster democracy.
Recently, my graduate school had a conference with the speaker Dr. Trudy Huskamp Peterson, former acting archivist of the United States and archivist for the United Nations high commissioner on refugees. She discussed the use of archives, specifically in places that had previously encountered civil unrest – the Balkans in the late 1990s, Somalia, Guatamala. In the aftermath of war, revolution or civic tumult, there are several questions both institutions and individuals ask to achieve any sort of normalcy. What happened? Why did it happen? How did it happen? How can we prevent this from happening again? Victims of violent action and their families have a right to answers and the archival community has a duty to preserve the records which provide the answers. In these instances preservation is a tool of social justice, providing voices for the voiceless and agency for the victims. Alex Boraine, a South African politician following the end of apartheid, said it better than I ever could: “It is necessary to turn the page of history, but first we need to read that page.”
So, how does this relate to the role of digital stewardship? What does this have in common with digital preservation? Plenty. Internationally, cell phone videos have documented protests, civilian executions and other atrocities. War tribunals and genocide trials can use these pieces of evidence, and more, to convict the guilty, and provide reparation and solace for the victims and their families. In these instances, preserving the data is noble, even dangerous.
This is why I first wanted to become an archivist; it’s also why I wanted to become a digital archivist. I wanted to help right the wrongs of history and give voices to the voiceless. To do that, we need to be able to access the evidence that we create digitally. If we can’t do that basic task, then we cannot hope to do anything further. Assuring that we have authentic digital archives, therefore, is a way to ensure that there can be accountability in the long term.
Working here at NDIIPP has only confirmed the nobility of digital stewardship. For example, one of the digital stewardship programs here at NDIIPP ensures democracy by providing for the longevity of born digital government records. Another program does the same with public records. In my time here at NDIIPP, I have been overwhelmed by the opportunities presented to do lasting good.
Well, that’s the end of my speech. I have truly enjoyed my time here at NDIIPP and hope that you all liked my blog posts. (And keep an eye out for a couple more that may pop up from time to time.)
As always, I wish you all safe data.
I’ve always loved the term “lossy” compression (add a “y” to anything and the “cute” factor really goes up). But just like a baby tiger is cute only so long as you understand that it will one day grow into a vicious, man-eating beast, lossy compression is cute only so long as you understand that it may someday come back and bite you if you’re thinking about long-term preservation.
That sounds a bit hyperbolic so let me step back a bit. In 2011 I wrote about IDOM, four simple steps to helping you start thinking about how to preserve your own digital materials (for the record, it’s Identify, Decide, Organize and Make copies). One undeniable factor in “make copies” is that there’s a trade-off everyone has to make between quality and affordability.
We all want to store our digital data at the highest quality possible, but higher quality generally means larger file sizes, which means more storage which means more money. Compressed data, generally speaking, takes up less physical storage space and moves more easily over networks. The file size difference can be dramatic.
Let’s say you wanted to rip your CD collection and store it as high-quality WAVE files on an external hard drive. A digital file that holds a typical three-minute song on a CD is 30–40 megabytes in size so an average CD would be around 450 megabytes. If you had 1000 CDs in your collection you’d need about ½ a terabyte of storage. Things aren’t so bad these days, cost-wise: ½ terabyte would only run you about $40 (10 years ago it would have run you almost $1200.)
Now lets say you wanted to save storage space by compressing the audio. The MPEG Layer III Audio Encoding (MP3 for short) typically reduces the file size for an audio song by an order of magnitude. So that half a terabyte would now be around 220 gigabytes and cost you roughly $20 total (prices for external hard drives fluctuate quite a bit so don’t hold me to these prices!).
However, when we’re thinking about preserving digital information we generally want to avoid compressing the data, unless we can compress it “losslessly.” “Lossless” compression means that we can shrink the size of any arbitrary piece of digital content, but we can also bring it back to its original size without losing any information in the transformation process.
“Lossy” compression, on the other hand, is a data encoding method that compresses data by removing part of it. Different compression schemes apply different algorithms to determine how to effectively discard the data while keeping the image within an acceptable level of quality as determined by the user’s needs, but there’s no getting around the fact that once the data is discarded under “lossy” compression schemes it’s gone for good.
While institutions (and individuals) want to save on costs as much as possible, we all want to retain as much of the utility of the information as we possibly can. We have no idea how much storage or bandwidth will cost in the future (hopefully less) nor do we know what future users might do with current data (undoubtedly many interesting things), but we’re pretty sure we want to keep our options open.
An MP3 is an example of lossy compression. If you compress that original WAVE file utilizing the MP3 compression scheme the information you remove to decrease the file size is gone for good and you can’t bring it back. It is possible to convert your MP3 back to a WAVE file using available software tools, but all you’ll have is a mediocre WAVE file. The original information is gone and you can definitely hear the difference.
So if you want to preserve an audio file for the long-term you either need to keep it in its original format or utilize a compression scheme that allows you roll back your compressed file to its original form.
There are a number of lossless compressions schemes for audio, though they’re not implemented equally by the major digital media players.
Of course, a large amount of data can be discarded before the result is sufficiently degraded to be noticed by the user, but it’s the same situation as the audio described above. Had I been thinking long-term I might have made a different decision on the final-state format for my photo.
If planning these things out from the start, it’s most advantageous to start with a high-resolution master lossless file that can then be used to produce compressed files for different purposes; for example, a multi-megabyte file can be used at full size to produce a full-page advertisement in a glossy magazine while a smaller, lossy copy can be made for a small image on a web page.
A consideration of lossy vs. lossless compression is just one factor in identifying sustainable stewardship practices, but it’s an important one to consider, especially at the start of a digital workflow. The Still Image Working Group of the Federal Agencies Digitization Guidelines Initiative has been exploring these issues in great depth.
Consensus is still developing on most sustainable preservation master formats (see recommendations from NARA, the American Society of Media Photographers and others) but compression is certainly one of the big issues to consider.
The stewardship community will undoubtedly spend plenty of time managing and preserving lossy files (huge numbers of JPEGs and MP3 files are already out there), but if you’ve got the option make yours lossless!
We like to think (and hope) that our blog The Signal acts as an informative resource from which to learn and engage in conversations of digital preservation work. We hope that it exposes you to interesting projects and people stewarding digital collections, and that it creates opportunities to expose you to with the wider community of libraries, archives, museums, government agencies, and other organizations.
Just as the blog is a vehicle to explore challenges and solutions practitioners address managing digital information, we’re coming up on an excellent opportunity to share best practices and lessons learned from our individual and collective work – face-to-face.
That opportunity is Digital Preservation 2013, the annual meeting of the National Digital Information Infrastructure and Preservation Program and the National Digital Stewardship Alliance. This year’s meeting will be held on July 23-25 in Alexandria, VA. Registration for the meeting is now open.
Once a year, we aim to put together a program that stimulates new thinking and exposes us to innovative ideas across the digital information landscape. It’s a chance to highlight collaborative projects on preserving and accessing digital collections, feature discussions of current tools and services for the management of digital content, and share current approaches to storage and infrastructure challenges. For the ninth year now, we’ve been fortunate to gather members of the community and other interested professionals in an environment to share information about collecting, preserving, and delivering our cultural heritage, scientific and other valuable digital materials.
This year is no different and the planning committee has been busy crafting a solid, if not packed, agenda featuring speakers who are strong advocates for social and technological innovation. We’ll hear about topics such as managing “big data” at scale, the cultural and scholarly value of public and historical digital content, and the challenges of preserving digital resources and exploring how to provide access to them.
The meeting kicks off on the afternoon of July 23 with keynotes talks by Hilary Mason, chief scientist at bit.ly, and Sarah Werner, undergraduate program director at the Folger Shakespeare Library. Later in the afternoon, we’ll be hearing from a panel of educators and practitioners including Christopher (Cal) Lee of University of North Carolina at Chapel Hill; Jason Scott of the Archive Team; Anne Wootton of Pop-Up Archive; and Travis May of the Federal Reserve Bank of St. Louis on their own innovative approaches to preservation of various types of content. We will round out the day with a series of lightning talks and a poster session from members of the National Digital Stewardship Alliance community.
On July 24, Lisa Green, executive director at Common Crawl, and Emily Gore, director for content at Digital Public Library of America, will lead off the day with opening talks. Following, there will be a unique dialogue on the emerging topic of digital preservation and environmental sustainability, the “Green Bytes: Sustainable Approaches to Digital Stewardship” panel with David Rosenthal of Stanford University; Kris Carpenter of the Internet Archive; and Krishna Kant of George Mason University and the National Science Foundation. The final panel of the day features speakers including Aaron Straup Cope of Cooper-Hewitt Museum Labs; Rodrigo Davies from the MIT Center for Civic Media; and Amy Robinson of the EyeWire project talking about issues at the leading edge of digital stewardship activity.
Day two will also feature a variety of smaller breakout sessions that will enable conversation around topics of digital preservation education and training, demos of digital preservation tools and services, and presentations on digital curation topics. And there will be special presentations by the NDSA Innovation Award winners, who will be named in the coming weeks.
On July 25, day three, we’re planning to co-host a CURATEcamp, an unconference. Last year we co-hosted CURATEcamp: Processing that focused on the intersection between archival and computational notions of processing. This year, we wanted to focus more on the digital stewardship community’s perspectives and discuss ideas about the exhibition of digital collections dealing with narratives, storytelling, and context. Any curators, archivists, librarians, scholars, software developers, computer engineers and others looking to share, demonstrate and refine ideas about exhibition in the digital age are encouraged to attend and contribute to crafting the day’s discussions. More information about the camp will be shared on this blog in the next couple of weeks.
Digital Preservation 2013 will be held at the Westin Alexandria in Old Town Alexandria, VA. There are two separate registration forms, one of the main meeting and one for CURATEcamp: Exhibition. If you plan to attend both, please make sure to register using the separate forms.
There is no cost to attend either meeting but seats are limited and available on a first come, first served basis.
Late in April, in Ljubljana, Slovenia, the International Internet Preservation Consortium gathered for it’s annual General Assembly. This year is the 10th anniversary of the organization, and we marked the milestone by reflecting on our past accomplishments and thinking about how the members could work together to make positive and lasting impacts on the field of web archiving.
The mission of the IIPC is to “to acquire, preserve and make accessible knowledge and information from the Internet for future generations everywhere, promoting global exchange and international relations.” The original vision included collections that were built with common tools and practices that would enable interoperable access across systems and boarders. This vision is still at the core of what the IIPC is trying to accomplish, and many strides have been made toward this goal.
At this year’s GA several discussions centered around how to keep the IIPC making significant contributions to cultural heritage and remain relevant in a fast-changing web world. In addition to noting achievements the members suggested to the Steering Committee how to guide the organization in the next 10 years, below are the major themes that came out of these discussions during the entire week of the General Assembly.
Support Common Tools & Standards
The most obvious achievements of the IIPC are open source tools for the harvesting, processing and navigating of web archives—Heritrix, Wayback and WARC Tools. These are the foundational tools used by most of the members and by a larger world-wide community for commercial and preservation purposes. Members absolutely depend on these tools and the web archiving practice has been built around them. These tools have always been open source but they are currently being moved to Github so a broader base of developers can contribute to, maintain and improve the code. The IIPC also developed the ISO preservation standard for web archives, WARC.
Looking toward the next 10 years, members recognize the current tools need to evolve or change to capture what will be the future of the web. The IIPC must continue to support new tools like the Memento Aggregator and the Live HTTP Proxy harvester. A diversity of tools to collect and preserve web sites will be advantageous to future uses because different tools will be able to collect different parts of the web. Some thinking about the need for evolving tools was presented at the in-depth discussion David Rosenthal of Stanford University and Kris Carpenter of the Internet Archive hosted about the future of web capture.
Build the Collection
The collection built by IIPC members is truly massive and global. Over a petabyte of web sites have been captured and indexed. Entire domains from many countries have been preserved; the newest country to have the legal authority to preserve their whole domain is the United Kingdom. This content will be of great interest to current and future researchers.
During the GA an open conference was held titled “Scholarly Use of Web Archives: Progress Requirements and Challenges” where researchers who are using web archives in their work now shared with members their methods, interests and (sometimes) frustrations. No two researchers seem to want the same thing out of web archives.
Sophie Gebeil of Aix-Marseille-Université presented her research into how African immigration is discussed in the French web domain, and analyzed the contents of web sites much like other documentary evidence. While Megan Dougherty of the Loyola University of Chicago explained that she is most interested in specific features and design elements on web pages and how those change over time, not necessarily the intellectual content on the site. Ditte Laursen, a researcher in Denmark, explained she needed to capture second by second changes on social media and television web sites for her work.
IIPC members have web archive content that is of great interest to researchers but providing access to those collections are often a challenge. Two important researcher-lead initiatives to build access tools and establish methodologies in building research corpses were shared that hold great promise in bridging the gap between those who collect and preserve web archives and users.
Build the Community
The IIPC began in 2003 with 12 members. Tool development was the original focus. Standards and best practices for web archiving grew with the organization. The IIPC has grown to include 44 members, all willing to share best practices, develop tools and resources for the global cultural heritage community. It is the primary resource for organizations that are just starting web archiving programs, and it is a venue for organizations with mature web archiving programs that want to advance the field. The work has been truly collaborative as embers worked on projects and tools that met their local needs while contributing to the tools, standards and practices of the web archiving domain. The unique quality of the collaboration has been the focus on shared practice rather than organizational differences. The focus has resulted in a international resource of expertise in web archiving.
In recent years the IIPC established an Education and Training program to fund professional development workshops and sponsoring a PhD student at the University of North Texas Information School for special studies in web archiving. This effort coupled with the outreach and awareness projects the IIPC has taken on are important to building the community for web archiving and maturing the field. The IIPC as an organization is also maturing with the addition of its first full-time employee, Mary Pitt of the British Library, who will serve as both the Program and Communication officer.
The National and University Library of Slovenia were perfect hosts for the 2013 IIPC General Assembly. If you want to know more details about what was shared over the week see Ahmed AlSum’s detailed summary of the GA. Ro, Old Dominion University. Rosalie Lack of the California Digital Library also shared her impressions.
All presentations will be posted on the IIPC website.
A smart-alecky way to answer the question in the title above would be: “why everything, of course.” But we don’t traffic in snark here, at least not intentionally.
User expectations influence so much of what stewardship organizations do. We collect and preserve all content primarily to support use, but the issue is especially important in a digital context.
Digital stewardship at scale can involve significant resources over a long period of time, and justifying that allocation centers around access. As the Blue Ribbon Task Force on Sustainable Preservation and Access noted as a primary finding, ”when making the case for preservation, make the case for use. Without well articulated demand for preserved information, there will be no future supply.”
The matter jumped to the forefront of my attention a few weeks ago during Archiving the Web: How to Support Research of Future Heritage?, a meeting at the National Library of the Netherlands. All the presentations dealt with the issue of connecting with researchers, but one by Bernhard Rieder struck me as intriguingly proactive. Rieder is a researcher in the Media Studies department of the University of Amsterdam and is interested in using big collections of web archives to study social patterns and trends. He talked about the methodological challenges of using such collections and concluded with a “wish list” of what he wanted from institutions that collect web archives. While I do not necessarily agree with these wishes, I did appreciate hearing his perspective. Here is a summary of the list:
- Moral, political and legal support for researchers and others building large collections of web data.
- Fast search based access to a continuous 1% sample of institutional holdings with translated URLs and click data, as well as comprehensive statistics for the collection.
- An easy, non-bureaucratic way to submit researcher data collections to institutions, without having to use standardized formats.
Other researchers surely have their own wishes, and it does seem to me that libraries and archives could benefit from hearing them. There is an novel approach to do this underway at the University of Melbourne, where the IT Research Services Department actively engages with researchers, including face-to-face meetings to learn about what they want. The university found that researchers appreciate the chance to meet, and as a result the department is taking an active approach to social engagement with a “Bazaar” of choices and approaches. Many of the approaches focus on helping researchers learn about tools and how to best apply them in working with data collections.
We also have explored the issue on this blog, with a series of posts by guest posts by Kalev H. Leetaru on A Vision of the Role and Future of Web Archives. The International Internet Preservation Consortium also recently held a meeting on the subject of Scholarly Access to Web Archives: Progress, Requirements, and Challenges.
It would be great to learn about other efforts to plumb researcher interest–please let us know about them.
While Noah Lenstra was working on a website about African-American history in Champaign-Urbana, Illinois, many of the people he met at local public libraries, churches and businesses told him they had personal and family memorabilia they wanted to digitize, or they had digital stuff that they didn’t know what to do with. Lenstra, a PhD student at the University of Illinois Graduate School of Library & Information Science, saw that there was a need in the community for personal digital archiving guidance. So with the help of a grant from the Illinois Humanities Council, he gave a series of much-appreciated public workshops in various cities and towns around the state on the topic of “Digital Local & Family history.” The workshops yielded a few startling revelations for Lenstra.
Like any good educator, Lenstra not only wanted to share information and confirm the effectiveness of his work, he also wanted to observe the participants to see what he could learn from their actions. So he spent a lot of time working shoulder to shoulder with them, huddled in groups around computers, watching what they did online, asking questions and taking notes.
As he expected, genealogy and local history were popular (they have long been a draw for public libraries) and almost 70% of the attendees told him they were equally interested in both. But he also noticed two striking trends. First, librarians — particularly front-line staff – were hungry for guidance about personal digital archiving and some traveled great distances to take the workshop. And second, while people in the community accepted his didactic “eat your vegetables” personal digital archiving advice (the kind I dole out all the time), they had already organically developed a fun approach on their own that they used all the time: they uploaded their personal archives to Facebook and other social media sites to share it with their family, friends and community. Although a vocal minority of participants had many privacy concerns with the “social web” and did not use it, most had started using these tools both to share personal archives with others and to find content in others’ personal archives that documented facets of their own lives.
Lenstra didn’t rely solely on showing PowerPoint slides in his workshops. “We tried to make the workshops as interactive as possible,” said Lenstra. “I introduced them to sites like 1000Memories (now part of ancestry.com) and omeka.net, and had them create accounts. We had hands-on scanning exercises. I also encouraged participants to use the computers to look up things I mentioned during the workshops.”
But Facebook kept popping up in conversations.
Eventually Lenstra realized that the widespread use of Facebook indicated a need and desire for an easy-to-use tool that would enable users to stay in touch with people in the present and connect with their past and with others who shared that past. “People were interested in collaborative platforms for sharing some of their personal archives with others,” said Lenstra. “They wanted to use their personal archives for purposes that went beyond them as individuals.”
Connection is key to understanding the phenomenon. Lenstra said, “People want to make connections between things they have in their own personal archives and things that may be in other people’s personal archives, between one person’s history and other people’s histories.”
Lenstra gave an example of a man whose father played in a local band in the 1960s and 1970s. The son uploaded a photo of his father and his band to a site that had the theme, “You know you grew up in Urbana-Champaign if you remember….” Lenstra said, “People who were related to the other members of the group started tagging their own family members. In no time, all the people in the photograph were identified and then other people started sharing their memories of people seeing the band perform. That kind of thing happens all the time, where a personal item quickly becomes something that’s not just about the family, it’s about the place where that family lived. In people’s memories, both the family and the place are bound together.”
Another fact that Lenstra teased out from the tangle of observations is that, for some people, Facebook has become a convenient digital repository. “I’ve heard people say that their computers crashed and they had a lot of photographs lost,” said Lenstra. “And they were able to recover many photographs through Facebook because they had put so many of their photos on it. I am not saying that Facebook is a solution, but I think pragmatically for some people it may be a more viable option than taking on the responsibility of backing up their files every five years or so. I remain dubious that people are actually going to do that.”
Setting the workshops in public libraries was a smart choice. Lenstra said that public libraries have a unique role in the community and libraries are increasingly playing a more active role in helping their communities with personal digital matters. About 25 percent of the workshop attendees were staff from public libraries from the cities where the workshops were held and from the surrounding towns.
Librarians are also getting more involved with local history than ever and, by extension, with individual history. Lenstra cites one example of a librarian from a small town in southern Illinois who developed a local history Facebook page. Occasionally she would post photos on Facebook and within a short time people from her social network would identify some of the photos and the people in them. Lenstra said, “Another time the librarian wanted to purchase a local historical artifact that she found on eBay, so she just posted a message on her page to the effect that the library wanted to purchase it and she was soliciting donations. Within a day, people had pledged enough money to cover the cost. She is enthusiastic about how she — a public librarian — can catalyze the attention of people in her community around sharing local and family histories.”
Lenstra gave an example of how a library in Champaign-Urbana recognized and responded to a digital-preservation need in their community. The library has two publicly accessible scanners intended for patrons to scan copies from the local archives. “But now, the scanner is used more by people who bring in personal things that they want to scan,” said Lenstra. “The library didn’t plan for this but the scanners are a resource that people want and need, so now the library is advertising the scanners as resources for people to digitize portions of family photographs or whatever. Clearly there is a real opportunity to reinvent this service area.”
Lenstra’s next workshop is on June 22 and he sees it as an occasion to take what he’s learned so far and fine tune the workshop. He’s still mulling over questions like, “How can libraries make Facebook work for them?” and “Is there something libraries can do or create that fulfills the same role that Facebook now fills?” Lenstra said, “We’re still trying to figure out what is motivating people. What do they get that’s personally valuable out of some of these social media sites?”
He plans to meet eventually with library administrations to diagnose what is going on in their libraries and explore ways that libraries could better share local resources, serve their community’s digital preservation needs and support local and family history. Perhaps they could partner with other local institutions or maybe act as a community digital repository. Lenstra said, “One of the libraries we worked with had a partnership with the county government and the county government did all of their IT support. That kind of resource-sharing partnership makes new things possible without draining existing resources.”
But it’s clear that technology needs to catch up to the way people behave – which partially explains the popularity of social media – and information professionals need to acknowledge how people actually do things. Despite our best intentions and our wishes for more people to practice good personal digital archiving, the number of people who actually backup their files is small compared to the number of people who upload many of those same files to Facebook. This fact is backed up by Microsoft Research’s Cathy Marshall, an early leader in research on personal digital archiving. In a Library of Congress presentation, she said, “Nobody really does backup. They secretly hope that someone else is doing backup.”
Lenstra made a similar observation. He said, “I don’t think most people see personal digital archiving as something inherently useful. Many see personal archives as a means to some other kind of end. And so when we do talk about it to them, what kind of language can we use that wouldn’t be overwhelming or just be perceived as an onerous burden on their time?”
Lenstra is not suggesting that, despite the general public inaction about “what you should do,” we shouldn’t stop offering personal digital archiving help. People might not backup their files but they should still be aware of good preservation practices. And, in response to how people actually do things, we information professionals should re-think our approach.
Public libraries are the common meeting ground for all of us. As for how to improve the situation, Lenstra said, “I’d like to see more public librarians get the support they need. And I’d like to see more public libraries offer services that help people, so people feel like they don’t just have to do things on Facebook…help people understand that there are options or resources in their community to help them preserve their personal digital stuff. I may be idealistic but I’m hoping that, with support, more public libraries can and will help.”
This is a Guest Post by Abbie Grotke, the Library of Congress Web Archiving Team Lead and Co-Chair of the National Digital Stewardship Alliance Content Working Group.
In this installment of the Content Matters interview series of the National Digital Stewardship Alliance Content Working Group, I interview Jim Corridan, President of the Council of State Archivists and Matt Veatch and Beth Shields, Co-Chairs of the State Electronic Records Initiative Steering Committee about their work.
Abbie: What is the Council of State Archivists? What sorts of organizations are involved?
Jim, Matt and Beth: The Council of State Archivists is the non-profit association comprising the 56 directors of the principal archival agencies in each state and territorial government in the United States. In a majority of states, the state archivists also have responsibility for records management services.
Abbie: Could you tell us a bit about what you see as the primary value of state archives and government electronic records?
Jim, Matt and Beth: While not well understood, state archives play a critical role in our democracy. They collect the records — including electronic records — that ensure government transparency and accountability; protect the legal, civil and property rights of citizens; promote historically-informed public policy decisions; and preserve essential documentation of the nation’s history.
Abbie: How are the electronic records in state archives are being used and by what types of users?
Jim, Matt and Beth: At this early stage in the effort to collect digital materials, users of state archives electronic records appear to be similar to users of paper records — attorneys performing legal research, journalists reviewing governmental decisions, scholars investigating community history, teachers preparing lesson plans, students incorporating primary sources into research projects, property owners identifying previous residents of their homes and genealogists seeking family history. As state archives enhance their capacity to collect and make electronic records available online, the number and variety of users will expand and use cases will diversify.
Abbie: Tell us about the background of the State Electronic Records Initiative — how did it get started, and what are the goals?
Jim, Matt and Beth: State archives have long worked to address electronic records effectively and comprehensively. The first state electronic records initiative began in 1979 with a grant to the State Historical Society of Wisconsin. However, state archives have struggled to keep pace with the exponential increase in the volume and complexity of electronic records, particularly in the face of precipitous funding cuts over the last decade. Thus, what was for many years a major concern has now become a crisis.
In July 2011, CoSA launched the State Electronic Records Initiative, the first comprehensive national effort to improve electronic records preservation in state government, with initial funding provided through Library Services and Technology Act grants from Indiana and Kentucky. In Phase 1, each state archives and records management program completed a survey about their existing electronic records programs and participated in extended follow-up telephone interviews. Using this baseline data, CoSA established goals for additional SERI activities in four broad areas:
- Education and training, to provide state archives staff with opportunities to develop the knowledge and skills required to collect, manage and preserve electronic records.
- Awareness, to raise the level of support and knowledge of the electronic records issue among allied organizations and key stakeholders.
- Governance, to integrate the electronic records management and archives requirements in decisions made during IT planning, procurement, systems development and operations.
- Best practices and tools, to establish a resource center for electronic records standards, tools and policies and to develop pathways to success for strengthening archives and records management programs.
The SERI Steering Committee established subcommittees for each of the four areas of emphasis. Forty individuals representing twenty-three state and territorial archives contribute to the work of these very active subcommittees.
CoSA also received federal grant support for SERI activities in two of the four areas of emphasis. An IMLS Laura Bush 21st Century Librarian grant will fund continuing education scholarships and three immersive, week-long electronic records institutes for staff from all 56 state and territorial archives; and an NHPRC grant will support the development of an interactive web portal for electronic records training, tools and standards, designed specifically to the address the needs of government records archivists.
Abbie: What are the biggest digital preservation challenges faced by state archives and records management programs? Have any of those changed since the original survey took place?
Jim, Matt and Beth: State archives and records management programs face many of the same digital preservation challenges that other digital stewards are grappling with: identifying and implementing appropriate digital preservation tools and services; selecting the electronic records that warrant long-term preservation; automating ingest workflows; securing technical infrastructure; capturing adequate preservation metadata; adopting sound preservation planning practices; and providing consumer-friendly access to electronic records. A particularly challenging issue for most state archives is developing effective strategies for ensuring that records management and archival requirements receive appropriate consideration during government information system planning, procurement, design and implementation. State archives operate in an environment in which they are responsible for preserving electronic records of enduring value generated by dozens of different government agencies involved in an array of disparate lines of business. The electronic records produced within this government ecosystem are increasingly complex, interrelated and voluminous. If automated records management and archival functions are not embedded in government information systems, state archives will struggle to effectively preserve the American historical record. As noted earlier, state archives are attempting to meet the digital preservation challenge in an era of severe resource scarcity. SERI is designed to provide state archives with tools that will assist them with all of these challenges.
Abbie: Is there anything that surprised you about the survey results or self-assessments that have been performed as a part of SERI?
Jim, Matt and Beth: Considering that the electronic records issue has been on the radar screen for many years, it was somewhat surprising to see the relatively limited number of state archives that have implemented extensive electronic records programs. That being said, we should note that nearly all state and territorial archives recognize the importance of developing their electronic records programs and, in the months since the SERI Phase 1 assessment, many states have hired electronic records staff and/or launched new or expanded electronic records initiatives. We hope that upcoming SERI activities — particularly the electronic records institutes — will spur additional states and territories to engage digital preservation more aggressively.
Abbie: Your strategic training and education program sounds incredibly valuable and timely. What sorts of topics will be covered in the training? How many will be trained as a part of this program?
Jim, Matt and Beth: CoSA is enthusiastic about the potential impact of the IMLS-funded SERI education and training program. The electronic records institutes kick off this summer with a one-week introductory session for state archives that are in the initial stages of establishing a digital preservation program. The curriculum for this intensive training camp will take a lifecycle approach to electronic records management and digital preservation and covers topics ranging from policy development, advocacy, collaboration and financial sustainability to metadata standards, workflows and trustworthy digital repositories. In 2014, introductory institute attendees will have the opportunity to participate in one of two advanced electronic records institutes that will explore electronic records and digital preservation in more depth. State archives with more developed electronic records programs also will participate in one of the advanced institutes. A series of webinars on specific digital preservation topics will round out the training opportunities afforded to electronic records institute participants.
In addition to the institutes, the SERI IMLS grant also provides all 56 states and territories with up to $1000 in grant funds to use for staff training on electronic records management and preservation. State archives can use these continuing education funds for off-site training, on-line training, collaborative regional training, or to bring training on-site.
Abbie: I noticed on your website that SERI has produced a video. Are there others planned?
Jim, Matt and Beth: While SERI has posted one YouTube video – Mary Beth Herkert, Oregon State Archivist, describing her state’s cloud-based records management system — we don’t have specific plans for additional videos. However, the electronic records institutes and work on the interactive web portal may generate opportunities to share our ideas through videos. CoSA is also developing a series of webinars on electronic records, and those too may soon be publicly available.
Abbie: What is the current focus of work for SERI? Where do you go from here?
Jim, Matt and Beth: The IMLS and NHPRC grant projects will continue to be SERI’s focus for the next 12 to 18 months. Beyond that, we are evaluating strategies for engaging CoSA’s stakeholders more effectively to enhance digital preservation awareness and expertise within the broader government information management community to develop a national understanding of the challenges and risks faced by electronic records and the potential losses of significant portions of the American story.
What kind of content matters to you? This is but one case for preserving valuable content for long term access. If you or your institution would like to share your own story of use and long term value of access to a particular type of born-digital resources, please send us a note at email@example.com and in the subject line mark it to the attention of the Content Working Group. We would love to hear from you!
The following interview is a guest post from Karen Cariani, Director of the WGBH Media Library and Archives at WGBH Educational Foundation and Co-Chair for the National Digital Stewardship Alliance Infrastructure Working Group.
Open source software is playing an important role in digital stewardship. In an effort to better understand the role open source software is playing, the NDSA infrastructure working group is reaching out to folks working on a range of open source projects. Our goal is to develop a better understanding of their work and how they are thinking about the role of open source software in digital preservation in general.
For background on discussions so far, review our interviews with Bram van der Werf on Open Source Software and Digital Preservation, Peter Van Garderen & Courtney Mumma on Archivematica and the Open Source Mindset for Digital Preservation Systems and Mark Leggott on Islandora’s Open Source Ecosystem and Digital Preservation. In this interview, we talk with Tom Cramer, Chief Technology Strategist & Associate Director, Digital Library Systems & Services at Stanford University Libraries.
Karen: Could you give us some background on the Hydra project? How did this project come about and what are its goals and objectives?
Tom: Hydra’s goals are to combine the power of a repository for enterprise-scale digital asset management and preservation, with tailored interfaces, workflows and access systems specific to different content types and streams–e.g., articles vs. images vs. time-based media vs. books vs. data. The project started in 2009 when three universities (Hull, Stanford, Virginia) plus Fedora Commons came together to see if they could jointly develop a flexible application framework to complement Fedora. The project motto quickly became “if you want to go fast, go alone; if you want to go far, go together”, and we’ve spent as much energy on building a vibrant and sustainable community as we have on the code.
Karen: Could you tell us a bit about how you and your institution got involved in this project? Further, could you tell us a bit about how your thinking on digital repository platforms has changed and developed over time?
Tom: In 2008-09, Stanford was re-evaluating the architecture and platform for the Stanford Digital Repository. When we started building the first generation of the system in 2005, we decided to write our own repository from scratch, as we we didn’t think the existing platforms at the time were good starting points for us. By 2008, we had a much better sense of our needs–which included collaborative development on a shared platform. We also felt that the repository community, and Fedora in particular, had greatly matured, and would make a serviceable component in our environment for the second-generation SDR. In discussions with colleagues at UVa and Hull, it became clear that we were not the only ones with the same needs, and a joint approach could be highly leveraged.
Karen: The NDSA infrastructure working group’s exploration of open software is focused on figuring out if there are any inherent benefits to using open source software for parts of an organization’s digital preservation strategy. Do you see any such inherent benefits, and if so what are they?
Tom: Yes, I believe there is a substantial benefit around using open source software for digital preservation activities. Using open source software isn’t necessarily any cheaper than licensing commercial software, but it does give an institution a predictable ongoing cost for maintenance, and more control over direction than with most vendor products. So much of digital preservation is about the sustainability of supporting a system, these are important factors. The transparency that comes with OSS is also a benefit–with more eyes and more users on a given piece of software, the chances of uncovering latent issues is arguably greater than with proprietary software (though this would only apply in OSS projects with sufficient adopters, of course). Most importantly, though, I think using OSS puts the reins of digital preservation in your institution’s hands in a way that commercial software does not. In another ten years, when digital preservation is still better understood and there are a number of well-understood, commodity functions, I’d hope there is a robust and competitive marketplace of commercial solutions. Until then, though, the flexibility and customizability of OSS can provide a more direct path to meeting your preservation needs if they happen to fall outside the current market providers’ product lines.
Karen: I am impressed by the commitment of the community to share and work collaboratively to improve. What do you attribute this to? And how does this improve the project or make it a better model?
Tom: The pattern of collaboration is baked into Hydra’s DNA; from the very beginning, the software has been a shared development effort. I think this may be due to the fact that we didn’t start the code-base by open sourcing a single institution’s system, nor did we have all the heavy-lifting initially done by grant-funded programmers. We put a lot of time into ongoing communication, including weekly calls, constant IRC, and quarterly face-to-face meetings. That said, I think the biggest draw to aggressive collaboration is the quality of the technical work on Hydra. Developers like working with good developers, and by participating in Hydra collaboratively and sharing their work for re-use, many institutions feel like they’ve produced their best code.
Karen: How does the open community work? What are the guidelines or rules for participating?
Tom: We go to great lengths to make the community open and supportive. We value working code and welcome constructive engagement on any front: participants can add code or documentation, help in communications or outreach, or simply try to use the software, and ask question when they need help. We also put a lot of energy into training, to bring new community members up to speed. The Hydra Partners (of which there are now 17) are institutions that have each committed to the success of the project overall, not just their local use. We have a lot of coordination, a lot of consensus building, and very little central planning. In short, there aren’t “rules” so much as a community process, and people get out what they put in.
Karen: What is the best way to keep up on Hydra head development?
Tom: Joining one of the community email lists, firstname.lastname@example.org or email@example.com, is the best way to stay abreast of what’s up. We also do try to keep the website (http://projecthydra.org) up to date, but in a project that is as large and fast moving as Hydra is, there is always a little latency.
Karen: How do you think these projects can become sustainable? And how perhaps does being open help that sustainability vs a licensable vendor supported system?
Tom: I think any software project–open source or commercial–is sustainable when it provides more benefit than cost, and it’s clear to participants that they get more out of participating/using it than going another route. The great thing about vendors is they provide that focusing lens to translate a community’s interest (and revenue) into ongoing development and support that benefits the community. Hydra has succeeded so far for the same reasons; it has focused the community’s efforts around a common approach, and created a framework to enhance and expand it.
Yesterday, May 9, 2013, the U.S. government issued an executive order and an open data policy mandating that federal agencies collect and publish new datasets in open, machine-readable, and, whenever possible, non-proprietary formats. The new policy gives agencies six months to create an inventory of all the government-produced datasets they collect and maintain; a list of datasets that are publicly accessible; and an online system to collect feedback from the public as to how they would like to use the data. The goals are twofold — greater access to government data for the public, and the availability of data in forms that businesses and researchers can better use. This builds on the earlier White House Memorandum on Transparency and Open Government.
These documents were accompanied by a link to something that actually caught my fancy even more – a greatly expanded Project Open Data Github repository for guidelines, use cases and tools. This, alongside the ever-growing (and soon to be extensively updated) data.gov, are evidence of real efforts to release more data and make it truly useful and usable.
The documents provide guidance on open licensing, metadata, and standards, as well as lifecycle-based information stewardship. But what I personally keep struggling with are two questions: What IS open data? And how is is being preserved?
The project has some defining principles for open data that I think can inform any dataset preservation project. While reading through some of the documents, I came across this bullet point:
- Managed Post-Release. A point of contact must be designated to assist with data use and to respond to complaints about adherence to these open data requirements.
I am thrilled to see guidance about active management of datasets and supporting users in their work with the data. But what could be available for this and all open dataset projects is more attention on dataset preservation. These are a few of some great resources on this topic:
- The Library of Congress Sustainability of Digital Formats site on datasets
- A Report on the Preservation of Public Sector Datasets from Archives New Zealand
- Open Data and Archiving Datasets from the National Archives UK
- Life of a Dataset from ICPSR
- Best Practices for Archival Processing for Geospatial Datasets from the GeoMAPP Project
- Datasets, Issues, Contexts and Solutions from the Open Planets Foundation
Do public sector datasets present different issues for preservation from other datasets? Not really. They definitely have a potentially much higher level of public scrutiny and use. But they have the same level of investment of time and money in their creation, serve the research and public good, and present the same format preservation issues as other research data.
The following is a guest post by Tess Webre, former intern with NDIIPP at the Library of Congress
Preservation Week 2013 might be over, but digital preservation must go on every week of the year. In truth, preservation is an ongoing, long lasting process that requires active management. Don’t despair, though. I have some helpful suggestions to help keep you in the preservation-y mood until next year.
- Find an online stewardship project that appeals to you and give it some time. Try going here for insight into new projects. At a recent Digital Cultural Heritage DC meeting, members of a prestigious museum said that crowdsourcing in the past few years had accomplished as much as 14 full time positions, so know that you are making a difference.
- Locate a digital repository close to you; find out what activities/projects they are undertaking and use social media to promote work that you like.
- Follow NDIIPP on Twitter
- Like NDIIPP on Facebook
- Subscribe to the monthly Library of Congress Digital Preservation Newsletter via email.
- Volunteer at a local repository. Call one up and see if they need help or supplies. They are usually quite nice, in my experience.
- Write some blog posts to increase public awareness for libraries and repositories.
- Do some personal digital estate planning. (Here are some insights.)
- Take a stab at some digital disaster planning. (Here are some more insights.)
- Visit the Eyebeam Gallery and marvel at how they were able to salvage so many of their digital assets that had been damaged in the flooding of Hurricane Sandy.
- Take some time to do inventory on your storage media. Does it need an update? Here is one of the best guides on the topic.
- Do some double checks on your cloud storage. Everything accounted?
- Migrate proprietary files to an open sourced format.
- Organize your email. (Really, it’s worth considering.)
- Do some backups of your smart phone, tablet, digital camera and other mobile devices.
- Make sure your digital photographs are accurately described.
- Talk to your children about the need for good digital preservation skills.
- Read your kids a digital preservation fairy tale (This one is also good for adults.)
- Write out your worst personal digital data loss story. Reflect on it.
- Take a look at some of the Library of Congress’ many excellent videos that discuss the challenges and solutions facing digital archivists. I personally love this one of Cory Doctorow.
- Watch a film that deals with preservation (Keanu Reeves just came out with a documentary discussing digital movies.)
- Read NDIIPP’s new collection of digital preservation perspectives, it has many ideas and tips.
- Take some time to brush up on your Spanish by reading this tutorial on digitization of images from Cornell University Library.
- Or, if you don’t speak Spanish, brush up on your French, by reading this digital preservation management planning guide from University of Michigan.
- It’s also available in Italian.
- Or take the time to learn a programming language. It’s extremely satisfying and not that difficult to learn. Just start out with a “hello world” and you’re golden.
- If that doesn’t appeal to you, take some time to learn a metadata schema such as Dublin Core, METS or others.
- Pick a digital preservation mascot to keep you motivated.
- Take a look at the new Digital Public Library of America. It’s quite awesome, and has some fantastic exhibits.
- Check out the preservation of video games by going here.
- Or take some time to look at the preservation of geospatial data, by going here.
- Take a peek at how the internet used to look by visiting the Wayback Machine.
- Take a gander at Archive-It and one of its many exhibits.
- Or just take a look at all of the Internet Archive’s many wonderful projects.
- Also, take a look at the Library of Congress’ Minerva collection.
- Or, check out the British Web Archive’s many exhibits from across the pond.
- Or, see how they do web archives down under, by looking at Australia’s web archive Pandora.
- Or, take a look at the Czech Republic’s web archive.
- Finally, take a moment to view the Portuguese web archive.
- Organize an event with a local club to increase awareness of digital preservation initiatives. Remember to serve cake, people will come if there is cake.
- Attend a conference, or lecture discussing digital preservation. There is sometimes cake.
- Distribute digital preservation leaflets, or guides. Some are available here.
- Eat strawberry ice cream with sprinkles, hot fudge, and a maraschino cherry on top. It doesn’t really have anything to do with digital preservation, but if you made it this far down the list, you deserve a treat. Also, this is the best combination of ice cream flavors…according to science.
- Watch one of these fantastic short videos on digital preservation and root for Digiman!
- Check your personal digital archiving knowledge by taking this fun quiz.
- Didn’t do so well? Try giving yourself a personal digital archiving audit (bonus, there are pictures of puppies.)
- Take a look at these personal digital archiving tips.
And of course: 48. Identify; 49. Decide; 50. Organize; 51. Make Copies.
Oh dear, that’s 51. Sorry, I got carried away with my enthusiasm for the subject. Anyway, this list should keep you busy until next year’s preservation week. Have fun and let us know if there are other things you think should be on the list.
Until next time, I wish you all safe data.
I think of “citizen archivists” as the first responders of history, arriving early on the scene to gather, capture, describe and preserve ephemeral artifacts of interest and helping to ensure that they survive over time to share with the future.
Thoughts on citizen archivists and their importance to institutions like ours was running through my head last night as we hosted Ian MacKaye of the independent label Dischord Records in a presentation here at the Library.
MacKaye founded Dischord as a teenager with partner Jeff Nelson in 1980, and he and Nelson went on to form Minor Threat, a group that, along with Bad Brains, has been credited with introducing the DC hardcore ethic to an audience beyond Washington.
As a performer, producer and enthusiastic supporter, MacKaye has documented music coming out of the Washington, D.C. underground for the past 30 years. Much of the music and art he’s compiled is not being comprehensively addressed by major collecting institutions, but that’s not unusual.
There’s often a gap between an activity and the entrance of its artifacts into the halls of culture, even while the material may have long demonstrated cultural and economic merit. There are complexities to how popular cultural and folk arts ultimately get embraced by cultural heritage organizations, but that gap in time is a key concern to the digital stewardship community. The concern is that valuable materials will be lost through the ravages of time without intervention, supporting the idea that we need to creatively engage with citizen archivists to help identify important materials early in their lifecycle and to assist in their long-term care.
Richard Cox touches on these issues in an interesting 2009 paper, Digital Curation and the Citizen Archivist. In it he provides a conceptual argument for how the current interest in digital curation might guide professional archivists as they embrace partnerships with private collectors to capture materials while concurrently developing training programs to assist private citizens in how to preserve, manage and use digital personal and family archives.
The evolving nature of personal digital archiving is described succinctly in this passage:
In the past, while there has always been tension between private and public (institutional) collectors, it has been the institutional collectors –archives, libraries, museums, and historic sites – that have won out. In the future, there may be less certainty about this, especially as so many personal papers are digitally born and pose challenges to the public archives. The good news is, however, many private citizens care as passionately about the documents as do the institutional repositories.
As Cox notes, “personal collecting can seem quirky or frivolous, but it always reveals some deeper inner meaning to life’s purpose.” The “citizen archivist” takes collecting to another level by engaging with materials in a deeper, more systematic way.
We’ve tried to address the need for guidance with our personal digital archiving materials and the National Archives has a program for engaging with citizen archivists, but more work needs to be done, along the lines of what Cox is exploring, on how to help train the people who, as private citizens, will be caring for some portion of our future digital collections, whether they recognize that they’re “citizen archivists” or not.
MacKaye doesn’t really consider himself a citizen archivist, but the work that he’s been doing over the past thirty years goes beyond the level of mere collecting and provides models for creative ways to gather and provide access to archival materials.
After his time in Minor Threat he helped form the band Fugazi, who released seven albums and played more than 1,000 concerts in 50 states and several foreign countries between 1986 and 2003. The band’s sound engineers recorded more than 800 of these shows.
“I’d say it was for posterity, but to what end, we had no idea,” Mr. MacKaye said in a New York Times article in 2011. “As with a lot of collections, once we had a couple hundred tapes, we just continued to amass them. Why stop? We’d already gotten this far.”
Almost any collection becomes interesting once you get enough stuff in one place, but there was obvious cultural and historic value to a collection of Fugazi live concerts. The label started the Fugazi Live Series in 2004 and moved it to the web in 2011 to eventually make available a complete archive of the Fugazi concert experience.
One of the many interesting things about the Fugazi live series is how the label designed the web site to bring the audience into the equation, supporting user submissions of photographs, fliers and other ephemera related to each event, deepening the engagement while offering an opportunity for the audience to share their personal experiences and stories.
Citizen archivist’s proximity to events also allows them to be more proactive about making materials available. As MacKaye said last night, “somewhere down the road, some kid very much like me will be interested in what was happening during this time. Because, most of the time, what was happening in the past has always been curated by the mainstream media industry. They’re the ones that decided about the history of rock.”
Whether we recognize it or not, we’re surrounded by citizen archivists. The challenge for us in the information professions is to find creative ways to support and channel the wonderful energy going on in the wider world to ensure that diverse materials of incredible value survive over time.
P.S. There are more photographs of the event on our Facebook page and we welcome your submissions. The event was recorded and will be made available for webcast shortly. Follow our Twitter feed for the latest information.
Historicizing the Digital for Digital Preservation Education: An Interview with Alison Langmead and Brian Beaton
In this installment of the NDSA innovation working group’s ongoing series of innovation interviews I talk with Alison Langmead and Brian Beaton about the approach they are taking to teaching Digital Preservation at the University of Pittsburgh. Alison holds a joint appointment in the Department of the History of Art and Architecture and the School of Information Sciences. Brian holds an appointment in the School of Information Sciences. In this interview we explore how they approach teaching digital preservation. You can read the syllabus for the course here.
Trevor: Could you give us a quick overview of your digital preservation graduate course?
Alison: Sure. Brian and I were interested reframing the contemporary practice of digital preservation as an imperfect and ongoing response to the history of digital culture. For example, decisions made in the 1940s, 1950s and 1960s about computing architecture still affect our work today, and we thought it was crucial for our students to not only understand today’s tools, but also to engage critically with the complex, layered legacy of information technologies.
Brian: We were also interested in teaching people to tack between past and present while making decisions about the objects in their stewardship. Building on Alison’s point, we wanted to situate digital preservation problems as outcomes and effects of choices, activities, and interactions over time that involved a tremendous range of human and non-human actors (although, I should add, we focused on the U.S. due to the typical career trajectory of our students at the University of Pittsburgh).
Alison: Indeed. To this end, we organized the 15-week course into two parts. In the first part, we focused our attention on primary source documents that captured the messy and contingent nature of emergent digital culture and its preservation. We began with texts from the 1940s and 1950s, working towards the present by decades, but as we approached the 1990s, we began examining ever-smaller increments of time. Each week, we would read documents produced only during the time period in question, concentrating on the ways in which human actors in the past understood digital technologies. The second part of the course was devoted to lab work and student presentations.
Brian: I would describe our approach as Media Archaeology meets Historical Epistemology. We tracked ideas, knowledge, machines, platforms, practices, and actors as they mutated over time— eventually congealing into something now commonly called digital culture, which presents a host of unique complications and challenges when it comes to its preservation.
Trevor: What do you see as the advantages of taking this approach to teaching digital preservation?
Brian: One key effect of this course design was that students were introduced to the computerization of American life as a continually unfolding interplay between technological obduracy and obsolescence. In the labs, we then encouraged students to apply that knowledge to contemporary information management problems. We also tried to model an outlook and sensibility that we believe is necessary for anyone interested in the preservation of digital culture; we instructed our students to conceptualize themselves as existing and operating in a moment that will likewise be rendered obsolete, perhaps soon. As information professionals interested in digital culture, they will have to constantly toggle between now-time, then-time, and future-time. To work in this area requires not just an understanding of data and files, but a whole set of physical and cognitive routines, aptitudes, and maneuvers. Our approach, I hope, captured some of the complexities around digital preservation and the tricky positioning of anyone working in this area.
Trevor: Alison, you have a background in Art History and Brian has a background in Science and Technology Studies. To what extent do you see each of those backgrounds structuring or changing how you approach digital preservation?
Alison: Brian and I both hold a firm belief in the importance of the historical contextualization of current-day information practices. We tried to present the history of digital culture in the United States as a critical piece of knowledge that preservationists can bring to bear on the effective stewardship of digital objects over the long-term. In terms of my own background, my training in the concrete and abstract issues surrounding material culture often leads me to emphasize visual knowledge and the impact that materiality can bring to a problem. Discussions about digital preservation concern the material manifestations of decades worth of decision-making.
Brian: My background often leads me to emphasize the social production of knowledge and the cross-traffic between “experts” and society. In terms of structuring my approach to digital preservation, I wanted students to leave the course as emergent experts in digital preservation and stewardship but also as deeply aware of the gaps and limitations in their own knowledge, and aware of the need for continuous re-training and re-tooling as they come to manage digital things in their everyday work lives. We also presented the professional conversations around digital preservation and stewardship as far from singular, unified, or coherent. Presenting the field as perpetually unsettled seemed more faithful to reality and more likely to position our students as critical, self-aware practitioners.
Trevor: How did you decide on how to periodize the history of computing in your course design?
Brian: In some ways the choice was arbitrary, structured by the limits of an academic term. We wanted the last few classes before the labs to focus on the most current research in this area, and then we worked back from there.
Alison: Also, in some ways the choice was tactical and meant to disrupt common periodizations of computing history. We wanted our students to think of this history as contested and open to re-periodization. For example, we investigated how the computerization of occupational and personal realms occurred at different rates and times, and spawned equally uneven conversations about digital preservation that continue into the present day.
Brian: In fact, the issue of uneven technology diffusion and uneven response on the part of the information professions became a major theme of the course.
Trevor: It strikes me that there are two related but different values in historicizing digital preservation education. On the one hand, the artifacts now making their ways into libraries, archives and museums come from different historical periods and as such an internalist understanding of different digital technologies and their features and affordances is valuable. With that said, more broadly, there is a value in understanding that computing has a social and cultural history. That is, a significant part of understanding (or for that matter, preserving, describing, and interpreting) a digital object involves entering into the past as a foreign country and coming to see it as someone in a different historical circumstance saw it. I am curious to know if you see a similar tension between these two values for historicizing and if in designing your syllabus there was any tension between focusing on the internalist story of devices and technologies changing over time and the externalist story of what those devices and technologies mean to different people in different historical contexts?
Alison: This tension is critical to our course design. In many ways, our entire course was predicated on this same observation. It is important to know both an insider’s history of computing as well as the social and subject effects of IT infrastructure.
Brian: This tension, I would add, is what makes digital preservation really interesting as an area of research, teaching, and practice. There are so many possible entry points into these uneven and overlapping conversations about the preservation of digital culture that emerged in the wake of computerization. There are also so many different zones of comfort and discomfort in any classroom. Some students might want to talk about data remanence or reconstructing hard drives or building the perfect emulator. Other students might want to talk about the work itself: project management, blurrings between consumption and production, or staffing and labor issues. Many students also arrive at the topic with a broad interest in the social and cultural history of technology. To address the second part of your question about coverage within the course itself, our effort to navigate between the internal and external, I think one of the more interesting and generative moves that we made involved reading outside the usual digital preservation literature. In preparing the course, we searched through field-specific journals in areas like nursing, banking, schooling, government, urban planning, and social science. Almost every nameable field has some version of a “Computers! What are they for?” article from the 1960s, 1970s, or 1980s. Reading these types of articles allowed us, as a class, to excavate the story of how specific machines and devices entered specific occupational realms. As instructors, we tried to call attention to subtle differences across domains that are often left un-named and lumped together.
Trevor: I’m curious about the extent to which some related notions like Media Archaeology can play into this historical approach to thinking about digital preservation. I interviewed Lori Emerson about her work on the Media Archeology Lab and I would be curious to hear what you see as the similarities and differences between the approach to your lab and the Media Archeology perspective Lori describes as informing her lab.
Brian: Your interview with Lori Emerson provides a wonderful distillation of Media Archaeology’s scattered intellectual origins and impulses. Our approach to teaching digital preservation shares a close affinity with Lori’s work at Boulder. Although we organized our course in “real time,” moving students experientially from the 1940s to the present, the only reason we moved chronologically was to capture and reveal subtle shifts in self-understanding and knowledge by the various human actors who were thinking, making, and doing with digital technologies. As I mentioned above, I would describe our approach as Media Archaeology meets Historical Epistemology. Thinkers like Ian Hacking and Lorraine Daston were just as influential on our course design as the various writers and thinkers named by Lori (e.g. Foucault, Kittler, et al.)
Alison: Perhaps one difference between our respective approaches, if I had to name one, is that our course focuses equally on historical components as well as on present-day electronic record-generating activities and the practice of digital preservation. Part of my own training is in the field of active information management, and I bring this training to the classroom with examples of current-day practices, policies and decisions. Digital preservation professionals continue their “training” every day by participating in their own digital cultural context. Policy decisions, the selection of particular hardware and software for the workplace and the home—all of these things are a part of the larger context shaping the ongoing conversations around digital preservation. Some key questions I raised as part of our class: How does the way we use information technology now impact how we treat historical objects? How does what we know about the past impact the way we, say, file our emails for future use? Does it make us think differently about using a site like Tumblr for our own personal purposes? How might the digital preservation profession play a part in actively and consciously constructing digital culture now and into the future? After all, this profession can (and has) made a profound impact on the ways in which people visualize their relationship to technology—an awesome responsibility.
Trevor: I know you are also working on a Digital Humanities Research network at the University of Pittsburgh. How do you see the relationship between your approach to digital preservation and your approach to digital humanities? Are these two parts of the same thing? Are they at odds with each other? Further, I saw some work from new media studies scholars, like Lev Manovich, on your syllabus. So, how do you see new media studies fitting together with digital humanities and digital preservation?
Alison: Yes, Brian and I are both involved with a group called the DHRX: Digital Humanities at Pitt. We are trying to create a strong but informal network of faculty who actively use digital technologies in their work, whether that be digital production or the use of digital methods to facilitate humanities-based research. Digital preservation strategies are always in the forefront of my mind when using technology in my research, and I am often in a place of being able to provide advice and collaborative support to my colleagues. If we do not consider how DH work will persist into the future (or even if we want our work to persist into the future), we are not, in my opinion, doing complete justice to our efforts.
Brian: I would describe new media studies, digital preservation, and digital humanities as organizational artifacts of our sociotechnical moment and as effects or symptoms of something larger happening at the intersection of people, information, and technology. In terms of our course design, we especially wanted to prepare our students to support new media and DH projects as they age, corrode, and ossify. I’ve written elsewhere about the “adaptive reuse” of “other people’s digital tools,” something I partly framed as a sustainability practice. In fact, because the principal contacts for the DHRX group at Pitt (Alison and myself) are also the people teaching digital preservation, the preservation side of DH is something I would really like to develop further as an area of research, teaching, and practice. If we take seriously past patterns and future predictions about obsolescence, then something like Preserving DH is already a long overdue anthology.
Alison: I agree very much with Brian that these academic fields seem like artifacts or affordances of something larger, not yet quite recognizable. Recent academic trends towards technology-oriented transdisciplinarity have demonstrated the benefits and the disadvantages of different scholarly communities coming together to work as groups. That might explain some of the simultaneity in terms of new media studies, digital preservation, and digital humanities. We have seen that some groups protect their identities so strongly that collaboration becomes impossible, while others have such a loosely-defined structure that they do not come to the table with any solidity, again making collaboration difficult. One possibility is to embrace co-existence and avoid worrying too much about academic fields, boundaries, and borders. Another possibility is to ask how we might bring these intellectual and methodological streams together productively without homogenizing the mixture and without just being strange bedfellows. In thinking about that very question, I am currently working with colleagues and graduate students on envisioning a course focused on digital materials and methods that will focus on this convergence and non-convergence of solid and ephemeral groups of actors grappling with digital culture in distinct but sometimes similar ways— some of whom study the digital, some of whom create in the digital, some of whom coopt the digital, some of whom reject the digital, and some of whom do all of these things and more, of course. We are playing around with the notion that to do this, we might best remove the human actors from the spotlight, and replace them with the technologies themselves. We often think of digital culture in terms of people and their material coagulations of mobile devices, desktop machines or pervasive sensor technologies, but what might the landscape of user interactivity as seen from the point of view of an embedded sensor teach us? What would a digital humanities/digital studies/digital preservation course look like from the point of view of the interface itself?
Brian: It sounds to me like this new course that you’re developing is Media Archaeology meets Historical Epistemology meets Actor-Network Theory meets Thing Studies…and the goal of the course is to investigate, as a set of interlinked symptoms or effects, the work happening in New Media Studies, Digital Preservation, and Digital Humanities. That’s pretty thick, elegant, and interesting. In closing, perhaps one further observation that can be made regarding our attempts to historicize “the digital” in digital preservation is that it seems to require a whole lot of aggregation: the combining of methods, terms, ideas, techniques, and theoretical tools from a wide range of literatures—which is only possible due to recent advances in search engines, databases, journal digitization projects, et cetera. Our course on the preservation of digital culture was designed and implemented by leveraging a good deal of present-day digital culture to dream up the structure and aggregate the content. That means our course, like the tools and technologies that we used to build it, may soon become obsolete. To me, that’s the best part of teaching digital preservation. It demands constant innovation.
The perfect digital preservation system does not exist. It may someday, but I don’t expect to live to see it.
Instead, people and organizations are working on iterations of systems, and system components, that are gradually improving how we steward digital content over time. This concept of perpetual beta has been around for a while; Tim O’Reilly explained lucidly in What Is Web 2.0 in 2005.
I gave a presentation recently in which I was expressing hope that prospective infrastructure developments for stewarding big data would bring benefits to the work of libraries, archives and museums to preserve digital content.
My intent was to convey that change should be iterative along a path to radical. In the spirit of avoiding bulleted presentation slides wherever possible, I searched for graphics that might help tell the story.
The one I ended up using was a picture from the Norfolk Record Office (UK) that showed delivery of a computer system some years ago. In it’s day, the Elliot computer was an advanced machine that cost the modern equivalent of nearly a million dollars. It read paper tape at 500 characters per second and had a CPU that was stored in a “cabinet about 66 inches long, 16 inches deep and 56 inches high.”
The picture got a good response from the audience and I wondered if perhaps I should have used others, perhaps from a later era, such as this one from Bell Labs in the late 1960s. This IBM mainframe was many iterations ahead of the Elliot, but any computer big enough to hide in surely needed to be delivered by truck as well.
These pictures are useful in illustrating a point that Clay Shirky and others made some time ago: the system should never be optimized. In other words, iteration and change should be embraced as a design principal. Any system surely can be improved–often radically– in the future. And, as time passes and successful migrations occur (our intent) the way we used to do things will inevitably will seem quaint in retrospect.
New Video: Digital Preservation at the Library of Congress’s Packard Campus for Audio Visual Conservation
We produce occasional short videos related to digital preservation. These videos address such topics as personal digital archiving, adding descriptions to digital photographs and the K-12 Web Archiving program, to name a few.
Our newest video profiles one of the Library of Congress’s most magnificent treasures: the Packard Campus for Audio Visual Conservation, located in the foothills of the Blue Ridge Mountains in Culpeper, VA.
This state-of-the-art facility resides inside Mt. Pony, a high-security facility formerly occupied by the Federal Reserve Bank. The facility was completely rebuilt and optimized for the preservation of material and digital audio and visual items. David Packard’s Packard Humanities Institute funded the renovation, in large part.
The Packard Campus opened in 2007 . It houses the Library’s vast collection — nearly 5 million items — of motion pictures, audio recordings, television and radio broadcasts, videos and video games; many are in obsolete formats. The material items in the collections date from the late 19th century onward.
Our new video showcases the Packard Campus as a world leader in the preservation of born digital and digitized collections. It shows how the Packard Campus gathers born-digital collections shipped on drives, ripped from CDs and DVDs, transferred over networked cable and captured from live broadcasts.
The video also shows how the Packard Campus digitizes material collections. For example, SAMMA robots digitize videotape in batches around the clock, specially designed machines digitize rare old paper film and the IRENE system uses lasers to map the grooves of fragile recordings without risking further damage of the grooves through contact with metal record-player styluses.
In the last step of the digital-file journey, high-capacity servers pull in the digital collections and transfer them to backup drives and tapes for storage. The repository is designed to anticipate large-scale expansion of the digital collections, as well as power and cooling needs of the server hardware.
The end result is not just long-term digital preservation; it’s remote access as well. The Packard Campus serves some digital items from its repository over the network to researchers at A/V stations 70 miles away at the Library of Congress’s Audio Visual reading rooms in Washington, DC.
This is a guest post by Jose “Ricky” Padilla, a HACU intern working with NDIIPP.
More and more cultural heritage organizations are inviting their users to tag collection items to help aggregate, sort and filter collection items. If we could better understand how and why users tag and what they’re tagging we can better understand how to invite their participation. For this installment of the Insights series I interview Jennifer Golbeck, an assistant professor at the University of Maryland, Director of the Human-Computer Interaction Lab and a research fellow at the Web Science Research Initiative about her ongoing studies of how users tag art objects.
Ricky: Could you tell us about your work and research on tagging behaviors?
Jennifer: I have studied tagging in a few ways. With respect to images of artworks, we have run two major studies. One looks at the types of tags people use. The other compares and contrasts tags generated by people in different cultures.
In the project on tag types, we used a variation of the categorization matrix developed by Panofsky and Shatford. This groups tags by whether they are about things (including people), events, or places and also by whether they are general (like “dog”), specific (like “Rin Tin Tin”), or abstract (like “happiness”). We also included a category for tags about visual features like color and shape. We found that people tended to use general terms to describe people and things most commonly. However, when they are tagging abstract works of art, they are much more likely to use tags about visual elements.
My PhD student Irene Eleta led our other study. She asked American native English speakers and native Spanish speakers from Spain to tag the same images. She found differences in the tags they assigned which were often culture specific. For example, on Winslow Homer’s “The Cotton Pickers”, Americans used tags like “Civil War” and “South” which Spanish taggers didn’t. This illustrates how translating tags can open up new types of access to people who use different languages and come from different cultures.
Ricky: Is there any of your research that you find would be particularly beneficial to those interested in digital stewardship?
Jennifer: Irene Eleta’s work on culture and language is very interesting. I think this is a relatively unexplored area, and there is so much that can be done by combining computational linguistics, other computing tools and metadata like tags to improve access.
Ricky: In your talk for the Digital Dialogues at the Maryland Institute for Technology in the Humanities you presented three research projects using tags on art. Could give us some background on research that was helpful informing your research in this area?
Jennifer: I come from a computer science background, so I am far from an expert in this area. I read up a lot on metadata and some existing tools and standards like the Art & Architecture Thesaurus. We also worked with museum partners who brought the art and museum professional perspective, which was very helpful.
Ricky: You explained in the talk that understanding what people are tagging and why can design better tagging systems. Could you elaborate on this idea?
Jennifer: Tags have been shown to provide a lot of new data beyond what a cataloger or museum professional will usually provide. However, to maximize the benefit of tags, it helps to understand how they will improve people’s access to the images. Worthless tags do not help access. Our work is designed to understand what kinds of tags people are applying. This can help in a few ways. First, we can compare this to the terms people are searching for. If search terms match tags, it definitely reveals that tags are useful. Second, we can see if tags are applied more to one type of image than another. For example, I mentioned that people use a lot of color and shape tags for abstract images. This means if someone searches for a color term, the results may be heavily biased toward abstract images. This has implications for tagging system design. We might build an interface that encourages people to use visual element tags on all images or we might use some computer vision techniques to extract color and shape data. At the core, by understanding what people tag, we can think about how to encourage or change the tagging they are doing in order to improve access.
Ricky: Has your research uncovered any ways to encourage tagging? If so what are some of the factors which encourage and discourage tagging?
Jennifer: We haven’t made it to that point yet. We have uncovered a number of results that suggest how we can begin to design tagging systems and what we might want to encourage, but how to do this is still an open question.
Ricky: In a study you compared tags from native English speakers from the USA and native Spanish speakers from Spain. Could you tell us a little about the findings of this investigation and how cultural heritage institutions could benefit from this research?
Jennifer: (I described this work a bit above). Cultural heritage institutions can benefit from this in a couple ways. If they have groups who use different languages, they can provide bridges between these languages to allow monolingual speakers to benefit from the cultural insights shared in another language. This can be done by translating tags on the back end of the system. It also suggests that in order to open up their collections to other cultures, language tools will be important.
Ricky: You mentioned automatic translations could help in improving the accessibility of digital collections but it was more complex than that. What are some of the pros and cons of automatic translation which you came across in your research?
Jennifer: I discussed some of the pros above. However, automated translation is a hard problem, especially when working with single words. For example, disambiguation is a classic problem. If you see the tag “blues”, does it refer to the colors or to the music? When there is surrounding text, a tool can rely on context, but that is much harder with tags. If we want to rely on translation, we will have to do more work in this area.
Ricky: Is there any other work you would like to do with data from theses studies, like the recordings of the eye-tracking sessions?
Jennifer: We have eye tracking data for people tagging images and looking at images. We also have it for people who spent time looking at an image for a while before tagging it and for people who began tagging immediately. It would be interesting to compare those to see how people look at art when they are given a task compared to when they are simply asked to look at it. Also, we can compare how people tag when they are familiar with an image vs. when they are seeing the image for the first time.
The following is a guest post by Nicholas Taylor, Data Specialist for the National Digital Information Infrastructure and Preservation Program.
A previous post on this blog explored why it’s so hard to come up with a reliable measurement of the average lifespan of a webpage. In essence, the argument came down to this: links and the websites they represent tend to become decoupled over time. Without a broad understanding of how that process takes place, it’s hard to make definitive claims about the persistence of websites when available automated tools can only capably check for the persistence of links.
In an ideal web, webmasters would adhere to Tim Berners-Lee’s notion of “cool URIs” – links that have been purposely maintained so as to remain stable. Stable links are more useful to users, and it is technically feasible to maintain any particular link for at least the lifespan of the resource it points to. However, given both the popular perception of and the abundance of scholarly literature on link decay, it’s probably safe to say that Tim Berners-Lee’s vision for a cool URI-enabled web hasn’t yet been realized.
The good news is that websites are more durable than links. This is supported by multiple studies and makes intuitive sense, as well. The bad news is that most contemporary web archiving tools are actually link archiving tools; they are designed to agnostically capture and replay the content represented by links, not the intellectual objects (i.e., the websites) of interest per se. For the Library of Congress thematic web archives, we can only assure that the links we’re capturing continue to correspond to the websites we care about preserving by manually inspecting them on an ongoing basis.
To better understand the discrepancy between link and website persistence as well as the disposition of websites that we previously archived, intern Heidi Hernandez and I revisited 1,071 links archived as part of the U.S. Election 2002 web archive collection. We excluded over 1,000 links corresponding to electoral candidate websites, as they were especially short-lived. The remaining links corresponded to state government, political party, advocacy group, newspaper, and political blog sites.
We followed a two-part methodology. First, following the approach of many other link persistence studies, we ran the entire list of links through a link checker and recorded the http response codes (e.g., 404, 200, 301, 500, etc.). Second, we visited each of the links and noted whether the corresponding website was the same as that which was archived. If the website was different, or if the link didn’t work, we tried to discover the new location of the website using search engines.
There were a few noteworthy findings:
- Taking the link persistence measurement as a measurement of website persistence overestimated the latter, because some working links pointed to different websites. In our study, 8% of working links pointed to different websites.
- Taking the link persistence measurement as a measurement of website persistence also and more significantly underestimated website persistence, because websites often existed at new locations even if their previous link either no longer worked or now corresponded to a different website. In our study, 82% of websites associated with non-working links and 48% of websites whose links now corresponded to different websites still existed.
- In aggregate, 94% of the previously-archived websites still existed, 3% more than would’ve been predicted by checking links alone.
This last point should most certainly not be interpreted as a sign of the superfluity of web archiving; recall that over 1,000 links to now-disappeared websites were excluded from the analysis. Also consider, for example, that just because the White House website still exists eleven years after we archived it as part of the U.S. Election 2002 web archive collection doesn’t mean that any of the resources that made up the White House website of eleven years ago are still accessible now.
All-in-all, though, the results suggest a more complicated picture of the ephemeral web than the popular conception that tends to conflate the disappearance of links and that of websites.
We have moved so far so fast with personal computing that older machines are acquiring a cultural patina. Everyone, seemingly, has a memory of ”old computers,” even if some people think having a hard drive under 100 gigabytes fits the definition.
There are perhaps two ways to think about obsolete computers. One is as trash or e-waste, which is a serious environmental problem. The issue has been building for years as computers and related peripherals age-out after a few short years and are replaced by equipment that itself will be tossed in the near future. Even if they work just fine, older machines often are perceived to be too slow, too clunky or too uncool to keep around. Recycling is possible, but it doesn’t always happen the way it should, resulting in exposures to dangerous chemicals and other materials.
Ironically, some older machines that escaped being dumped have a second life that far exceeds their original intended purpose. All you have to do is glance at the vintage computing section of an online auction website to see how valuable certain kinds of equipment has become. And, if you are lucky, you can even find good stuff for free: I liberated a fully functional Osborne 1 portable computer from a trash heap a few years ago, for example.
The rarest personal computers are the original models dating back to the 1970s. I found a great picture on Wikimedia that shows some of the earliest models, now exhibited at the Computer History Museum in Mountain View, California.
All this goes to say that if you know about a stash of old computer equipment it might be worth checking to see if it has secondary value. Older machines can live on for functional purposes, such as reading old software. Or they might simply have aesthetic value as reminders of the early days of computing. Either possibility beats adding to the e-waste problem.
Image scanning of one sort or another has been in common usage in some industries since the 1920s.
Yes, really, the 1920s.
The news wire services used telephotography — where images are captured using photo cells and transmitted over phone lines — well into the 1990s. Scanners and digital cameras like those we are familiar with came out of development in the 1960s and 1970s, and were already hitting the commercial market by the 1980s.
I have vivid memories of my first digitization project, because that project changed the course of my career.
In 1986 I was in graduate school and volunteering for the Fowler Museum of Cultural History at UCLA. One day the Collections Manager came down to the archaeology collections in the sub-basement (where I was surveying the human skeletal remains in the collections for our NAGPRA records) and said to me: “How would you like to move from the sub-basement to the basement”? How could anyone say no to that?
The project was to do a recon on all the paper records and enter them into the brand new Argus system running on a mini-mainframe. I am pretty certain that we were Questor’s second customer, after the Southwest Museum. While the recon project taught me the basics about what became the focus of my career — collection records management, digitization, system administration, being a DBA, working with authority control and creating multilingual controlled vocabularies — what was particularly exciting about the system was that it had the capacity to link to digital images.
So we started digitizing. We had acquired a particularly exciting and important archaeological collection, and I had the opportunity to work on the digitization. The objects were set on a stand and the image was captured via a video camera and written to tape, with a video titler used to embed the accession number into the image. The tapes were then mastered onto laser disks.
Now, this was very cutting edge – one entered an address for an image on a laser disk into a field in the object record, and the system could address the file on the laser disk and display it on a dedicated terminal. We had an early Sony Mavica camera, which used 3.5″ floppy disks as its storage media. And we had a printer, which printed color photos the size of old school Polaroids. It was heady stuff.
In 1988 I attended my first Museum Computer Network conference, another event that shaped my career. The 1989 MCN meeting was the pivotal one. We had our first meeting of a Visual Information SIG, where at least a dozen organizations shared their experiments, successes, and failures with digital imaging. I still have my write-up from that meeting, which appeared as a column in Spectra. I chaired that group for many years, and that group helped build a community around imaging practice that still exists.
Of course there were many early leaders and innovators in digital imaging. The American Museum of Natural History. The Fine Arts Museums of San Francisco Thinker imagebase. The Library of Congress American Memory project. Harvard University’s libraries and museums. Numerous Smithsonian projects. And too many others to name.
What other imaging projects were people involved in during the 1980s? If you are interested in the history of digital imaging I suggest the Digital Imaging page at CoOl, which includes a great historical bibliography. Not all the links work, but it’s a great jumping-off point for a history of the discipline.