The Signal: Digital Preservation

Subscribe to The Signal: Digital Preservation feed
The Signal: Digital Preservation
Updated: 34 min 29 sec ago

Illinois Library Offers Digital Preservation Aid in Response to Tornado Devastation

8 January 2014 - 5:19pm

It was raining hard on Sunday morning November 17, 2013, as librarian Genna (pronounced “Gina”) Buhr anxiously watched the Weather Channel coverage of the storm system battering central Illinois. Buhr, Public Services Manager at Illinois’ Fondulac District Library, was visiting her parents in Utica, Illinois, about an hour north of Buhr’s home. Her two young children were with her. It had been a relaxed Sunday morning, with everyone lounging in their pajamas, but the increasing severity of the weather gradually changed the mood in the house. At 8:40 a.m., the National Weather Service issued a tornado watch.

Buhr’s parents were as anxious as she was; in 2004, a tornado hit their small valley community, resulting in the death of nine people and 100 damaged or destroyed homes. By mid-morning, Buhr and her parents changed their clothing, put on their shoes and prepared to take shelter in the small basement of the 100-year-old house if the emergency signal sounded. Buhr’s father had designated a secure spot in the old coal room and set aside jugs of emergency water, just in case. They were as ready as they could be.

Shortly before 11:00 a.m., a tornado did touch down in Illinois but it was far away from them, about 60 miles south. It was close to Buhr’s home.Photo by the National Weather Service.

Photo by the National Weather Service.

The tornado formed southeast of East Peoria, not far from Fondulac District Library, and for almost an hour it moved steadily northeast, growing in strength as it traveled. Winds accelerated and peaked at 190 mph as the tornado ravaged the town of Washington, tossing cars, leveling houses and grinding everything in its path to rubble as it plowed on. Eventually, 46 miles away from where it touched down, it weakened and dissipated.

Photo by the National Weather Service.

Photo by the National Weather Service.

When Buhr felt it was safe to return home, she called some friends, family and co-workers to make sure they were alright. Then she put her kids in the car and headed out, passing debris, overturned vehicles, downed power lines and road diversions along the way.

When she got home she saw that her house was safe, with only a few small downed branches, so she left her kids in the care of her mother-in-law (her husband was in Florida) and went to check on the library.

The new Fondulac District Library building celebrated its grand opening to the public only two weeks before the tornado hit the area. The building was deliberately designed to be open, airy and filled with natural light. Three of the four exterior walls are almost completely glass, as is the three-story tower in the center of the building.

When Buhr arrived she was relieved to find her staff OK and the library barely damaged, so she set about almost immediately to mobilize her staff to help the tornado victims. Buhr had first-hand “aftermath” experience helping her family clean up after the 2004 tornado in Utica, and she was inspired by the supportive community spirit — how a lot of volunteers just showed up to help. Similarly, she and the library staff resolved to offer whatever resources it could, beginning with a centralized information resource for the community.

That afternoon she and her staff compiled a web page packed with storm assistance information. They listed emergency phone numbers, phone numbers for utilities, phone numbers for claims divisions of insurance companies and contacts for charitable assistance. Buhr also managed the social media posts that appeared almost instantly after the storm. “In the immediate hours after a disaster, there’s a lot of miscommunication through normal channels,” said Buhr. “If people could contact the library, we’d do our best to get them the answers and information they needed using the resources we had available, including our knowledge of the community and our research skills.” The web page invited people to come to the library to use the electricity, the computers, charge their phones, use the wifi. The library offered the use of a video camcorder so people could document damage. Or people could come in just for comfort. The web page stated, “Visit us to escape the elements – cold, wind, rain – or the stress for a moment or two. Read a book, the newspaper, or a magazine. Play a game. Unwind.”

Heather Evans, a patron of Fondulac District Library, has a special interest in preserving digital photos. Evans contacted Buhr to note that, in a post-disaster cleanup, damaged photos are often overlooked and discarded; Evans suggested that the library might be able to help the community digitize their damaged photos so the electronic copies could be backed up and preserved. Evans even offered to set up her personal copy-stand camera and to digitize photos for those affected by the disaster. Buhr thought it was a terrific, novel idea. “The project fit the features and priorities of the library in a unique way,” said Buhr. “We weren’t collecting water bottles or supplies or anything physical for distribution. Other organizations had that covered. We decided to rely on the skills and talents of our staff and volunteers to offer something equally meaningful and important that maybe other organizations could not.”Genn Buhr

Genn Buhr

While doing some research for the project, Buhr came across Operation Photo Rescue, a 501(c)(3) charity organization of volunteer photography enthusiasts that help rescue and restore damaged photos, particularly after natural disasters. Buhr consulted with OPR’s president Margie Hayes about OPR’s methods, about how Fondulac District Library’s project might work and to ask if OPR would be interested in collaborating. “We don’t have the Photoshop skills that Operation Photo Rescue’s volunteers do,” said Buhr. “We don’t have restoration capabilities here. But it would be a step in the right direction if we could at least get the digitization portion done.”

Within a few days, she had the commitment, the staff and the equipment for the project, which they dubbed Saving Memories. The next step was to get storage media on which members of the community could save their newly digitized photos. Buhr figured that some of the library’s vendors might have flash drives and thumb drives to spare, so she emailed them, explained the Saving Memories project and asked for donations of flash/USB drives. The response was overwhelming. Within days, the Fondulac District Library received more than 2,500 flash/USB drives. The library was ready. Once people had their scans in hand, all that remained to do was backup and care for their digital photos in accordance with the Library of Congress guidelines.

Less than two weeks after the tornado hit, Fondulac District Library set up Evans’ copy-stand camera scanning station and held its first Saving Memories session. To the staff’s disappointment, no one came.

“I did feel it was going to be a little early after the disaster, but it didn’t hurt to try it,” said Buhr. “It’s understandable though. It was a little too soon. People were still being reunited with their items, things that the storm blew away. They were still meeting basic needs, such as housing and transportation.” In fact, in the aftermath of the tornado around central Illinois, more than 1,000 homes were damaged or destroyed, there were 125 injuries and three deaths. So Buhr and her staff understood that the community had more important priorities than scanning photos. The trauma was still fresh and people had bigger concerns. Even Operation Photo Rescue doesn’t go into an affected community right after a disaster. They let peoples’ lives settle down a bit first.

Buhr is not frustrated or deterred. She has more sessions scheduled. She is coordinating with Operation Photo Rescue to hold a large copy run — basically a rescue session — at Washington District Library on February 21 and 22. They will offer further digitization services the following weekend, February 28 and March 1, at Fondulac District Library.

Buhr and her staff are looking beyond Saving Memories’ original goal of helping people salvage and digitize their photos. “We’re regrouping and thinking logistically — and bigger — about how this service can best benefit the community,” she said.

Fondulac District Library hopes to eventually get its own copy-stand camera setup so it can continue to offer a sophisticated photo digitization service. But that raises staffing issues. A qualified staff person -– one trained in photography and the equipment — has to run it, sessions have to be scheduled and someone has to maintain the equipment. Such services need to be thought through carefully. Still, it seems like a logical step in the library’s ongoing service to its community.

“We offer public computers, scanners and copiers,” said Buhr. “Why not also offer the community the use of a copy stand camera scanner?”

Buhr also plans to expand the scope of the project. Fondulac District Library may eventually use the equipment to scan historic photos from the library’s collections. “Part of the attention drawn by the launching of our new library is to our local history collection,” said Buhr. In the old library building, the collection was buried in the basement and not easily accessible. In the new library, the collection is prominently displayed and accessible in the Local History room. Buhr wants to digitize and promote the collection more aggressively.

The actions of Buhr and the staff of Fondulac District Library demonstrate that libraries can help their communities in unexpected ways, including digital preservation and personal digital archiving. Buhr said, “The project is a good match for Fondulac District Library in that –- in response to a disaster –- the project uses the resources and the archival and preservation spirit that libraries have. The project really takes advantage of the broad abilities of the library and the skills of librarians in a unique way. The mission of our Saving Memories project captures the essence of some of the missions of libraries in general — preservation, information and service to the community.”

Categories: Planet DigiPres

The National Digital Stewardship Residency, Four Months In

7 January 2014 - 2:59pm

The following is a guest post from Emily Reynolds, Resident with the World Bank Group Archives

 Jaime McCurry,

NDSR Christmas tree ornament. Photo: Jaime McCurry,

For the next several months, the National Digital Stewardship Residents will be interrupting your regularly-scheduled Signal programming to bring you updates on our projects and the program in general. We’ll be posting on alternate weeks through the end of the residency in May, and we can’t wait to share all of the exciting work we’ve been doing. I’ll start off the series with a quick overview of how it’s been going so far, and what you can expect to hear about in future posts.

After participating in immersion workshops for the first two weeks of September, we’ve been working at our host organizations to tackle their toughest digital stewardship challenges. Our work has been interspersed with group meetings and outings to professional development events; most recently, we heard from NYU’s Howard Besser at an enrichment session for the residents and our mentors. His talk centered around the challenges of preserving user-generated digital content, such as correspondence, email, and the disorderly contents of personal hard drives. The National Security Archive also hosted us for a tour and discussion of their work, where we were able to learn about some of the most prized (and controversial) items in their collection.

 Emily Reynolds.

CIA behavior control experiments at the National Security Archive. Photo: Emily Reynolds.

A major component of the residency is encouraging and facilitating our attendance at, and participation in, professional conferences. We’ll be presenting twice at ALA Midwinter: a series of lightning talks at the ALCTS Digital Preservation Interest Group meeting, as well as slightly longer presentations at the Library of Congress’s booth. Stay tuned for more information about other conferences that we’ll be participating in, as well as our reports after the fact.

As part of the residency, we’ve been asked to provide updates on our projects on our individual blogs and Twitter accounts. You can follow our Twitter activity on this list, and find links to all of our blogs here . We’ll be coordinating some special features on our personal blogs over the coming months, including interviews with digital preservation practitioners, discussions with each other, and up-close explorations of our institutions and projects; those features will be linked to from our upcoming Signal posts. For now, I’ll leave you with a roundup of some of the NDSR news you might have missed over the past few months:

Project updates:

  • Heidi’s answer to the question “so what exactly is Dumbarton Oaks, anyway?”
  • Julia’s discussion of the work being done at the National Security Archive
  • Lauren’s collection of resources related to media preservation
  • Jaime’s theory that William Shakespeare would have been a web archivist

Conferences and events:

  • Erica’s report from the AMIA conference
  • Jaime’s recap of the Archive-It partners’ meeting
  • Maureen’s discussion of her experience at the Tri-State Archivists’ Annual Meeting
  • Molly’s summary of an Accessible Future workshop and posts about the DLF Forum
Categories: Planet DigiPres

File Format Action Plans in Theory and Practice

6 January 2014 - 4:38pm

The following is a guest post from Lee Nilsson, a National Digital Stewardship Resident working with the Repository Development Center at The Library of Congress. 

The 2014 National Agenda for Digital Stewardship makes a clear-cut case for the development of File Format Action Plans to combat format obsolescence issues. “Now that stewardship organizations are amassing large collections of digital materials,” the report says, “it is important to shift from more abstract considerations about file format obsolescence to develop actionable strategies for monitoring and mining information about the heterogeneous digital files the organizations are managing.”  The report goes on to detail the need for organizations to better “itemize and assess” the content they manage.

Just what exactly is a File Format Action Plan?  What does it look like?  What does it do? As the new National Digital Stewardship Resident I undertook an informal survey of a selection of divisions at the library.  Opinions varied as to what should constitute a file format action plan, but the common theme was the idea of “a pathway.”  As one curator put it, “We just got in X. When you have X, here’s the steps you need to take.  Here are the tools currently available.  Here is the person you need to go to.”

For the dedicated digital curator, there are many different repositories of information about the technical details of digital formats.  The Library of Congress’ excellent Sustainability of Digital Formats page goes into exhaustive detail about dozens of different file format types.  The National Archives of the UK’s now ubiquitous PRONOM technical registry is an indispensable resource.  That said; specific file format action plans are not very common.



Probably the best example of File Format Action Plans in practice is provided by the Florida Digital Archive.  The FDA attempted to create a plan for each type of file format they preserve digitally.  The result is a list of twenty-one digital formats, ranked by “confidence” as high, medium, or low for their long term storage prospects.  Attached to each is a short Action Plan giving basic information about what to do with the file at ingest, its significant properties, a long term preservation strategy, and timetables for short-term actions and review.  Below that is a more technically detailed “background report” explaining the rational behind each decision.  Some of the action plans are incomplete, recommending migration to a yet unspecified format at some point in the future.  The plans have not been updated in some time, with many stating that they are “currently under discussion and subject to change.”

A related project was undertaken by the University of Michigan’s institutional repository, which organizes file formats into three specific targeted support levels.



Clicking on “best practices” for a format type (such as the above for audio formats) will take you to a page detailing more specific preservation actions and recommendations.  This design is elegant and simple to understand, yet it is lacking in much detailed information about the formats themselves.

An even more broad approach was done by the National Library of Australia.  The NLA encourages its collection curators to make, “explicit statements about which collection materials, and which copies of collection materials, need to remain accessible for an extended period, and which ones can be discarded when no longer in use or when access to them becomes troublesome.”  They call these outlines “Preservation Intent Statements.”  Each statement outlines the goals and issues unique to each library division.  This Preservation Intent Statement for NLA’s newspaper digitization project, goes into the details of what they intend to save, in what format, and what preservation issues to expect.  This very top-down approach does not go into great detail about file formats themselves, but it may be useful in clarifying just what the mission of a curatorial division is, as well as providing some basic guidance.

There have been notable critics of the process of the idea of file format action plans based on risk assessment.  Johan van der Knijff on the Open Planets Foundation blog compared the process of assessing file format risks to “Searching for Bigfoot,” in that these activities always rest on a theoretical framework, and that scarce resources could be better spent solving problems that do not require any soothsaying or educated guesswork.  Tim Gollins of the National Archives of the UK argues that while it might be true that digital obsolescence issues are real in some cases, resources may better be spent addressing the more basic needs of capture and storage.

While taking those critiques seriously, it may be wise to take a longer view.  It is valuable to develop a way to think about and frame these issues going forward.  Sometimes getting something on paper is a necessary first step, even if it is destined to be revised again and again.  Based on my discussions with curators at the Library of Congress, a format action plan could be more than just an “analysis of risk.”  It could contain actionable information about software and formats which could be a major resource for the busy data manager.  In a sprawling and complex organization like the Library of Congress, getting everyone on the same page is often impossible, but maybe we can get everyone on the same chapter with regards to digital formats.

Over the next six months I’ll be taking a look at some of these issues for the Office of Strategic Initiatives at the Library.  As a relative novice to the world of library issues, I have been welcomed by the friendly and accommodating professionals here at the library.  I hope to get to know more of the fascinating people working in the digital preservation community as the project progresses.

Categories: Planet DigiPres

The NDSA at 3: Taking Stock and Looking Ahead

31 December 2013 - 3:25pm, published under a Creative Commons Attribution 2.5 Denmark license

Image credit:, published under a Creative Commons Attribution 2.5 Denmark license

The end of the year is a great time to take stock. I’m currently in the “have I done irrevocable damage to my body during the holiday snacking season” phase of stock-taking. Luckily, the National Digital Stewardship Alliance isn’t concerned with whether anyone’s going to eat that last cookie and has a higher purpose than deciding whether the pants still fit.

The NDSA was launched in July 2010 but really got going with the organizing workshop held December 15-16, 2010 here in D.C., which makes this December the roughly 3-year anniversary of the start of its work. The workshop refined the NDSA’s mission “to establish, maintain, and advance the capacity to preserve our nation’s digital resources for the benefit of present and future generations” and also established the NDSA organizational structure of 5 working groups with a guiding coordinating committee.

It didn’t take long for the working groups to self-organize and tackle some of the most pressing digital stewardship issues over the first couple of years. The Infrastructure working group released the results from the first Preservation Storage survey and is currently working on a follow-up. The Outreach group released the Digital Preservation in a Box set of resources that provide a gentle introduction to digital stewardship concepots (note to LIS educators: the Box makes a great tool for introducing digital stewardship to your students. Get in touch to see how the NDSA can work with you on lesson plans and more).

The Innovation working group coordinated two sets of NDSA Innovation award winners, recognizing “Individual,” “Project,” “Institution” and “Future steward” categories of superior work in digital stewardship, while the Content working group organized “content teams” around topic areas such as “news, media and journalism and “arts and humanities” to dive more deeply into the issues around preserving digital content. This work lead to the release of the first Web Archiving survey in 2012, with the second underway. The Geospatial Content Team also released the “Issues in the Appraisal and Selection of Geospatial Data” report (pdf) in late 2013.

The NDSA has also worked to inform the digital stewardship community and highlight impressive work with an expanding series of webinars and through the Insights and Content Matters interview series on the Signal blog., published under a Creative Commons Attribution 2.5 Denmark license

Image credit:, published under a Creative Commons Attribution 2.5 Denmark license

And not least, the “2014 National Agenda  for Digital Stewardship” integrated the perspective of NDSA experts to provide funders and executive decision-makers insight into emerging technological trends, gaps in digital stewardship capacity and key areas for funding, research and development to ensure that today’s valuable digital content remains accessible and comprehensible in the future.

Over the coming year, the NDSA will expand its constituent services, working to integrate its rapidly expanding partner network into the rich variety of NDSA activities. The NDSA will also expand its interpersonal outreach activities through broad representation at library, archive and museum conferences and by engaging with partners in a series of regional meetings that will help build digital stewardship community, awareness and activity at the local level.

The next NDSA regional meeting is happening in Philadelphia on Thursday January 23 and Friday January 24, hosted by the Library Company of Philadelphia. We’re also in the early planning stages of a meeting in the Midwest to leverage the work of NDSA partner the Northern Illinois University Library and their POWRR project.

Look for more blog posts in 2014 that provide further guidance on the Levels of Preservation activity. The Dec. 24 post starts working through the cells on each of the levels, with an opening salvo addressing data storage and geographic location issues.

The NDSA has also published a series of reports over the past year, including the “Staffing for Effective Digital Preservation” report from the Standards and Practices working group. Look for a new report early in 2014 on the issues around the release of the PDF/A-3 specification and its benefits and risks for archival institutions.

The NDSA can look back confidently over the past three years to a record of accomplishment. It hasn’t always been easy; it’s not easy for any volunteer-driven organization to accomplish its goals in an era of diminishing resources. But the NDSA has important work to do and the committed membership to make it happen.

And like the NDSA, I’m looking forward to a healthier, happier 2014, putting those cookies in the rear-view mirror and hoping the pants will eventually fit again.

Categories: Planet DigiPres

11 Great Digital Preservation Photos for 2013

30 December 2013 - 5:02pm

Curiously, most of us in the digital memory business are hesitant to visually document our own work. Possibly this has to do with the perceived nature of the enterprise, which involves tasks that may seem routine.  But pictures tell an important story, and I set about finding a few that depicted some of the digital preservation focal points for the past year.

I did a Flickr search for the words “digital” and “preservation” and limited the results to photos taken in 2013. I also limited the results to “only search within Creative Commons-licensed content” and “find content to modify, adapt, or build upon.” There were 2 to 3 dozen results. While most fell into a couple of common categories, I was pleased to find 11 that struck me as especially engaging, unusual or otherwise interesting.

And while digitization is only a first step in digital preservation, I included a couple of shots that depict digital reformatting activities.

Caritat room, Biblioteca de Catalunya. Unitat de  Digitalització. From the ANADP 2013 meeting in Barcelona, by Ciro Llueca, on Flickr

Caritat room, Biblioteca de Catalunya. Unitat de Digitalització. From the ANADP 2013 meeting in Barcelona, by Ciro Llueca, on Flickr

Long-Term Preservation of Digital Art, by transmediate, on Flickr

Long-Term Preservation of Digital Art, by transmediate, on Flickr

Main Room, Biblioteca de Catalunya. Unitat de Digitalització, from ANADP 2013, by Ciro

Main Room, Biblioteca de Catalunya. Unitat de Digitalització, from ANADP 2013, by Ciro Llueca, on Flickr

At Personal Digital Archiving  2013, by Leslie Johnston, on Flickr

At Personal Digital Archiving 2013, by Leslie Johnston, on Flickr

With George "The Fat Man" Sanger at Personal Digital Archiving 2013, by Wlef70, on Flickr

With George “The Fat Man” Sanger at Personal Digital Archiving 2013, by Wlef70, on Flickr

Introducing the Archivematica Digital Preservation System, by Metropolitan New York Library Council, on Flickr

Introducing the Archivematica Digital Preservation System, by Metropolitan New York Library Council, on Flickr

Preservation of Tibetan books, Digital Dharma, by Wonderlane, on Flickr

Preservation of Tibetan books, Digital Dharma, by Wonderlane, on Flickr

Posters, Biblioteca de Catalunya. Unitat de Digitalització, from ANADP 2013, by Ciro Llueca, on Flickr

Posters, Biblioteca de Catalunya. Unitat de Digitalització, from ANADP 2013, by Ciro Llueca, on Flickr

PlaceWorld is the result of collaboration between  artists, social and computer scientists undertaken as  part of the eSCAPE Project, by Daniel Rehn, on Flickr

PlaceWorld is the result of collaboration between artists, social and computer scientists undertaken as part of the eSCAPE Project, by Daniel Rehn, on Flickr

 Sustainability of the Information Society, by Elco van Staveren, on Flickr

Visualisation Persist: Sustainability of the Information Society, by Elco van Staveren, on Flickr

Digitization Project at Radio Mogadishu, Somalia, by United Nations Photo, on Flickr

Digitization Project at Radio Mogadishu, Somalia, by United Nations Photo, on Flickr

Categories: Planet DigiPres

Protect Your Data: Storage and Geographic Location

24 December 2013 - 2:59pm
This post is about row one column one, the first box, in the levels of digital preservation.

This post is about row one column one, the first box, in the NDSA levels of digital preservation.

The NDSA levels of digital preservation are useful in providing a high-level, at-a-glance overview of tiered guidance for planning for digital preservation. One of the most common requests received by the NDSA group working on this is that we provide more in-depth information on the issues discussed in each cell.

To that end, we are excited to start a new series of posts, set up to help you and your organization think through how to go about working your way through the cells on each level.

There are 20 cells in the five levels, so there much to discuss. We intend to work our way through each cell while expounding on the issues inherent in that level. We will define some terms, identify key considerations and point to some secondary resources.  If you want an overall explanation of the levels, take a look at The NDSA Levels of Digital Preservation: An Explanation and Uses.

Let’s start with row one cell one, Protect Your Data: Storage and Geographic Location.

The Two Requirements of Row One Column One

There are only two requirements in the first cell, but there is actually a good bit of practical logic tucked away inside the reasoning for those two requirements.

Two complete copies that are not collocated

For starters you want to have more than one copy and you want to have those two copies in different places. The difference between having a single point of failure and two points of failure is huge.   For someone working at a small house museum that has a set of digital recordings of oral history interviews this might be as simple as making a second copy of all of the recordings on an external hard drive and taking that drive home and tucking it away somewhere. If you only have one copy, you are one spilt cup of coffee, one dropped drive, or one massive power surge or fire away from having no copies. While you could meet this requirement literally by simply making any type of copy of your data and taking it home, it will become clear that this alone is not going to be a tenable solution for you to make it further up the levels in the long run. The point of the levels is to start somewhere and make progress.

With this said, it’s important to note that all storage media is not created equally. The difference in error rates between something like a flash drive on your key chain, to an enterprise hard disk or tape is gigantic. So gigantic in fact that from error rate alone, you would likely be better off only having one copy on a far better quality piece of media than having two copies on something like two cheap flash drives. Remember though, the hard error rate of the storage devices is not the only factor you should be worried about. In many cases, human error is likely to be the biggest factor that would result in data loss, particularly when you have a small (or no) system in place.

“Complete” copies are an important factor here. Defining “completeness” is something worth thinking through.  For example, a “complete copy” may be defined in terms of the integrity of the digital file or files that make up your source and your target.   At the most basic level, when you make copies you want to do a quick check to make sure that the file size or sizes in the copy are the same as the size of the original files. Ideally, you would run a fixity check, comparing for instance the MD5 hash value for all the first copies with the MD5 hash value of the second copies. The important point here is that “trying” to make a copy is not the same thing as actually having succeeded in making a copy.  You are going to want to be sure you do at least a spot check to make sure that you really have created an accurate copy.

For data on heterogeneous media (optical discs, hard drives, etc.) get the content off the media and into your storage system

A recording artist ships a box full of CDs and hard disks to their label for production of their next release. A famous writer offers an archive her personal papers and includes two of her old laptops, a handful of 5.25 inch floppies, and a few keychain quality flash drives. An organization’s records management division is given a crate full of rewritable CDs from the accounting department. In each of these cases, a set of heterogeneous digital media have ended up on the doorstep of a steward often with little or no preliminary communications. Getting the bits off that media is a critical first step. None of these methods of storage are intended for long term; in many cases things like flash drives and rewritable CDs are not intended to function, even in optimal conditions, for more than a few years.

So, get the bits off their original media. But where exactly are you supposed to put them? The requirement in this cell suggests you should put them in your “storage system.” But what exactly is that supposed to mean? It’s intentionally vague in this chart in order to account for different types of organizations, resource levels and overall departmental goals.  With that said the general idea is that you want to focus on good quality media (designed for longer rather than shorter life), for example “enterprise quality” spinning disk or magnetic tape (or some combination of the two), and a way of managing what you have.  For the first cell here, the focus is on the quality of the media. However, as requirements move further along it is going to become increasingly important to be able to be able to check and validate your data. Thus easy ways to manage the data on all of your copies becomes a critical component of your storage strategy. For example, a library of “good” quality CDs could serve as a kind of storage system. However, managing all of those pieces of individual media would itself become a threat to maintaining access to that content. In addition, when you inevitably need to migrate forward to future media, the need to individually transfer everything off of that collection of CDs would become a significant bottleneck for being able to move to future media. In short, the design and architecture of your storage system is a whole other problem space, one not really directly covered by the NDSA Levels of Digital Preservation.

Related Resources

You’ve Got to Walk Before You Can Run: First Steps for Managing Born-Digital Content Ricky Erway, 2012

The NDSA Levels of Digital Preservation: An Explanation and Uses Megan Phillips, Jefferson Bailey, Andrea Goethals, Trevor Owens

How Long Will Digital Storage Media Last? Personal Digital Archiving Series from The Library of Congress

Categories: Planet DigiPres

Can You Digitize A Digital Object? It’s Complicated

23 December 2013 - 4:24pm
Visualization of magnetic information on a Floppy Disk

Visualization of magnetic information on a Floppy Disk

And if so, why would you ever want to? About a year ago the University of Iowa Libraries Special Collections announced a rather exciting project, to digitize the data tapes from the Explorer I satellite mission. My first thought: the data on these tapes is digital to begin with, so there’s not really something to digitize here. They explain, the plan is to “digitize the data from the Explorer I tapes and make it freely accessible online in its original raw format, to allow researchers or any interested parties to download the full data set. “ It might seem like a minor point for a stickler for vocabulary, but that sounds like transferring or migrating data from its original storage media to new media.

To clarify, I’m not trying to be a pedant here. What they are saying is clear and it makes sense. With that said, I think there are actually some meaningful issues to unpack here about the difference between digital preservation and digitization and reading, encoding and registering digital information.

Digitization involves taking digital readings of physical artifacts

In digitization, one uses some mechanism to create a bitstream, a representation of some set of features of a physical object in a sequence of ones and zeros. In this respect, digitization is always about the creation of a new digital object. The new digital object registers some features of the physical object. For example, a digital camera registers a specific range of color values and a specific but limited numbers of dots per square inch. Digital audio and video recorders capture streams of discrete numerical readings of changes in air pressure (sound) and discrete numerical readings of chroma and luminance values over time. In short, digitization involves taking readings of some set of features of an artifact.

Reading bits off old media is not digitization

Taking the description of the data tapes from the Explorer I mission, it sounds like this particular project is migrating data. That would mean reading the sequence of bits off their original media and then make them accessible. On one level it makes sense to call this digitization, the results are digital and the general objective of digitization projects is to make materials more broadly accessible. Moving the bits off their original media and into an online networked environment feels the same, but it has some important differences. If we have access to the raw data from those tapes we are not accessing some kind of digital surrogate, or some representation of features of the data, we would actually be working with the original. The alographic nature of digital objects, means working with a bit for bit copy of the data is exactly the same as working with the bits encoded on their original media. With this noted, perhaps most interestingly, there are times when one does want to actually digitize a digital object.

When we do digitize digital objects

In most contexts of working with digital records and media for long term preservation, one uses hardware and software to get access to and acquire the bitstream encoded on the storage media. With that said, there are particular cases where you don’t want to do that. In cases where parts of the storage media are illegible, or where there are issues with getting the software in a particular storage device to read the bits off the media there are approaches that bypass a storage devices interpretation of it’s own bits and instead resort to registering readings of the storage media itself. For example, a tool like Kryoflux can create a disk image of a floppy disk that is considerably larger in file size than the actual contents of the disk. In this case, the tool is actually digitizing the contents of a floppy disk. It stops treating the bits on the disk as digital information and shifts to record readings of the magnetic flux transition timing on the media itself. The result is a new digital object, one from which you can then work to interpret or reconstruct the original bitstream from the recordings of the physical traces of those bits you have digitized.

So when is and isn’t it digitization?

So, it’s digitization whenever you take digital readings of features of a physical artifact. If you have a bit for bit copy of something, you have migrated or transferred the bitstreams to new media but you haven’t digitized them. With that said, there are indeed times when you want to take digital readings of features of the actual analog media on which a set of digital objects are encoded. That is a situation in which you would be digitizing a set of features of the analog media on which digital objects reside. What do you think? Is this helpful clarification? Do you agree with how I’ve hashed this out?

Categories: Planet DigiPres

Am I a Good Steward of My Own Digital Life?

20 December 2013 - 7:08pm

After reading a great post by the Smithsonian Institution Archives on Archiving Family Traditions, I started thinking about my own activities as a steward of my and my family’s digital life.

I give myself a “C” at best.

My mother's china. Photo by Leslie Johnston

My mother’s china. Photo by Leslie Johnston

Now, I am not a bad steward of my own digital life.  I make sure there are multiple copies of my files in multiple locations and on multiple media types.  I have downloaded snapshots of websites.  I have images of some recent important texts.  I download copies of email from a cloud service into an offline location in a form that, so far, I have been able to read and migrate across hardware.  I have passwords to online accounts documented in a single location that I was able to take advantage of when I had an sudden loss.

I certainly make sure my family is educated about digital preservation and preservation in general, to the point that I think (know?) they are sick of hearing about it. I have begun a concerted but slow effort to scan all the family photos in my possession and make them available with whatever identifying metadata (people, place, date) that I gathered from other family members, some of whom have since passed away. I likely will need to crowdsource some information from my family about other photos.

But I am not actively archiving our traditions. I often forget to take digital photos at events, or record metadata when I do take them. I have never collected any oral histories. I have not recorded my own memories.  I do have some of my mother’s recipes (and cooking gear) and I need to make sure that these are documented for future generations.  I have other items that belonged to my mother and grandmother that I also need to more fully document so others know their provenance and importance.  And then I need to make sure all my digital documentation is distributed and preserved.

I asked some friends what they were doing, and got some great answers.  One is creating a December Daily scrapbook documenting the activities of the month. One has been documenting the holiday food she prepares and family recipes for decades, in both physical and digital form.  One has been making a photobook of the year for every year since her children were born, and plans to create a book of family recipes. Another has been recording family oral histories, recording an annual family religious service for over 20 years, and is digitizing family photos that date back as far as the 1860s.

How are you documenting and archiving your family’s traditions, whether physical or digital? And preserving that documentation?


Categories: Planet DigiPres

The Top 14 Digital Preservation Posts of 2013 on The Signal

19 December 2013 - 6:34pm
Based on Tip Top Liquors, by Thomas Hawk, on Flickr

Based on Tip Top Liquors, by Thomas Hawk, on Flickr

The humble bloggers who toil on behalf of The Signal strive to tell stimulating stories about digital stewardship. This is unusual labor. It blends passion for a rapidly evolving subject with exacting choices about what to focus on.

Collecting, preserving and making available digital resources is driving enormous change, and the pace is so fast and the scope so broad that writing about it is like drinking from the proverbial firehose.

Back when The Signal was a mere eye gleam, institutional gatekeepers were, as is their wont, skeptical. “Can you make digital preservation interesting?” They asked. “Is there enough to write about? Will anyone care?”

While we responded with a bureaucratic version of “yes, of course!” to each question, we had to go prove it. Which, after many months and hundreds of posts, I think we have done.

I attribute success to stories that have meaning in the lives of our readers, most of whom care deeply about digital cultural heritage. As noted, that topic is as diverse as it is dynamic. A good way to gauge this is to consider the range of posts that were the most popular on the blog for the year. So here, ranked by page views based on the most current data, are our top 14 posts of 2013 (out of 257 total posts).

  1. 71 Digital Portals to State History
  2. You Say You Want a Resolution: How Much DPI/PPI is Too Much?
  3. Is JPEG-2000 A Preservation Risk?
  4. Scanning: DIY or Outsource
  5. Snow Byte and the Seven Formats: A Digital Preservation Fairy Tale
  6. Social Media Networks Stripping Data from Your Digital Photos
  7. Fifty Digital Preservation Activities You Can Do
  8. Announcing a Free “Perspectives on Personal Digital Archiving” Publication
  9. Top 10 Digital Preservation Developments of 2012
  10. Analysis of Current Digital Preservation Policies: Archives, Libraries and Museums
  11. The Metadata Games Crowdsourcing Toolset for Libraries & Archives: An Interview with Mary Flanagan
  12. Doug Boyd and the Power of Digital Oral History in the 21st Century
  13. Moving on Up: Web Archives Collection Has a New Presentation Home
  14. Anatomy of a Web Archive

Special bonus: Page views are only one way to measure top-of-the-yearness. In the blogging world, comments are also important, as they indicate the degree to which readers engage with a post. By that measure, the top 14 posts of 2013 are slightly different.

  1. 71 Digital Portals to State History (51 comments)
  2. Snow Byte and the Seven Formats: A Digital Preservation Fairy Tale (21 comments)
  3. Is JPEG-2000 A Preservation Risk? (17 comments)
  4. 39 And Counting: Digital Portals to Local Community History (16 comments)
  5. Social Media Networks Stripping Data from Your Digital Photos (14 comments)
  6. You Say You Want a Resolution: How Much DPI/PPI is Too Much? (13 comments)
  7. What Would You Call the Last Row of the NDSA Levels of Digital Preservation? (12 comments)
  8. CURATEcamp Exhibition: Exhibition in and of the Digital Age (11 comments)
  9. Word Processing: The Enduring Killer App (10 comments)
  10. Older Personal Computers Aging Like Vintage Wine (if They Dodged the Landfill) (10 comments)
  11. Scanning: DIY or Outsource (10 comments)
  12. Where is the Applied Digital Preservation Research? (8 comments)
  13. The “Spherical Mercator” of Time: Incorporating History in Digital Maps (8 comments)
  14. Opportunity Knocks: Library of Congress Invites No-cost Digitization Proposals (7 comments)

Thank you to all our readers, and most especially to our commenters.

Categories: Planet DigiPres

A Digital Descartes: Steve Puglia, Digital Preservation Pioneer

18 December 2013 - 3:11pm
Steven Puglia (Photo by Barry Wheeler)

Steven Puglia (Photo by Barry Wheeler)

Steven Puglia, manager of Digital Conversion Services at the Library of Congress, died peacefully on December 10, 2013 after a year-long battle with pancreatic cancer. Puglia had a profound effect on his colleagues here in Washington and worldwide, and there is a great outpouring of grief and appreciation in the wake of his passing.

The testimony embedded in this tribute demonstrates that Steve’s passing left the cultural heritage, conservation and preservation communities stunned, somber and affectionate. Their words attest to his character, his influence and the significance of his work. He was a rare combination of subject-matter expert and gifted, masterful teacher, who captivated and inspired audiences.

“Generous” is a word colleagues consistently use to describe Puglia – generous with his time, energy, advice and expertise. He was a pleasure to be around, the kind of colleague you want in the trenches with you – compassionate, kind and brilliant, with a wry sense of humor.

Steve enjoyed sharing his knowledge and helping others understand. From International Standards groups to workshops, from guidelines to desk-side help for colleagues, Steve sought out opportunities to teach. During discussions of how detailed to get in the Guidelines, Steve would often remind us that digitization is, by its nature, a technical endeavor…He worked even harder to make it palatable for those who simply hadn’t gotten it yet. — Jeff Reed, National Archives and Records Administration and co-author with Steve Puglia and Erin Rhodes of the 2004 Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files

Photography defined Puglia’s life — both the act of photography and the preservation and access of photographs. It was at the root of his work even as his professional life grew and branched in archival, preservationist and technological directions.

He earned a BFA in Photography from the Rochester Institute of Technology in 1984 and worked at the Northeast Document Conservation Center duplicating historic negatives. In 1988, Puglia earned an MFA in Photography from the University of Delaware and went to work for the National Archives and Records Administration’s reformatting labs as a preservation and imaging specialist.

At NARA, Puglia worked with microfilm, storage of photographs and establishing standards for negative duplication. With the advent of the digital age, Puglia set up NARA’s first digital imaging department and researched the impact of digital technology on the long-term preservation of scanned images. He was instrumental in developing new methods of digital image preservation and helping to set imaging standards.

I feel very fortunate and thankful that I had the opportunity to work alongside Steve and to learn so much from him; Steve was a smart, inquisitive, kind, generous colleague, but even more so, he was an amazing teacher. He was generous in sharing his vast knowledge of digitization as well as traditional photographic processes and concepts – and the intersection of the two – in the work that we were doing at NARA.

I think writing the Guidelines was a labor of love for all of us, but especially for Steve. We collectively worried about how they would be perceived, how they would be useful, and about all the small details of the document. I remember especially struggling and working on the Image Parameter tables for different document types, all of us knowing these would probably be the most consulted part of the Guidelines. The fact that these tables are still relevant and stand strong today is a testament to Steve’s knowledge and contributions to the field. I feel lucky that I had a chance to learn from Steve; he was my first real mentor. We should all feel lucky to benefit from his knowledge. He will be missed. — Erin Rhodes, Colby College and co-author with Steve Puglia and Jeff Reed of the 2004 Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files

In 2011, Puglia joined the Library of Congress as manager of Digital Conversion Services where he oversaw the research and development of digital imaging approaches, data management, development of tools and other technical support in the Digital Imaging Lab.

It was not his first time working with the Library. In 1991 and 1992 he collaborated with the Preservation Directorate and over the past several years he had been a major contributor to the Federal Agencies Digitization Guidelines Initiative. He became chair of the FADGI Still Image Working Group; in August 2011, he posted an update about the Still Image Working Group on The Signal.

Steve was a driving force in creating guidelines to help steer cultural heritage institutions towards standardized methods for digitizing their treasures. While at NARA, he was the primary author of the Technical Guidelines for Digitizing Cultural Heritage Materials: Creation of Raster Image Master Files, the 2004 document that continues to serve as a teaching tool and reference for all those involved in digital imaging. In 2007, Steve extended his efforts to form the FADGI Still Images Working Group and participated as a key technical member, providing invaluable input on practically every aspect of imaging technique and workflow.

I chaired the group from its start through 2010, and I could not have accomplished half of what I did without Steve. When I was at a loss as to how to best proceed, Steve provided the guidance I needed. He was one of the most genuine and honorable individuals I have known. Steve was selfless in giving his time to anyone who needed assistance or advice, and he will be missed by those who knew him. His passing is a tremendous loss to the cultural heritage imaging community. — Michael Stelmach, former Digital Conversion manager at the Library and past FADGI coordinator.

In reading Puglia’s June, 2011, Signal blog post about the JPEG 2000 Summit, you get a sense of his excitement for his work and a taste of how well he can communicate a complex subject in simple language.

This aspect of Puglia’s character comes up repeatedly: his drive to make his work clearly understood by anyone and everyone. In Sue Manus’s blog post introducing Puglia to readers of The Signal, she writes, “He says the next steps include working to make the technical concepts behind these tools better understood by less technical audiences, along with further development of the tools so they are easier to work with and more suited to process monitoring and quality management.” And “From an educational perspective, he says it’s important to take what is learned about best practices and present the concepts and information in ways that help people understand better how to use available technology.”

Colleagues declare that Puglia was a key figure in setting standards and guidelines. They report that he led the digital-preservation profession forward and he made critical contributions to the cultural heritage community. They praise his foresight and his broad comprehension of technology, archives, library science, digital imaging and digital preservation, all tempered by his practicality. And they all agree that the impact of his work will resonate for a long time.

Sometimes the best discussions–the ones you really learn from–are conversations in which the participants express different ideas and then sort them out. It’s like the college dorm debates that can make the lounge more instructive than a classroom. Over the years, I learned from Steve in exchanges leavened with friendly contrariety. For example, in 2003, we were both on the program at the NARA preservation conference. I was helping plan the new Library of Congress audiovisual facility to be built in Culpeper, Virginia, and my talk firmly pressed the idea that the time had come for the digital reformatting of audio and video, time to set aside analog approaches. Steve’s presentation was about the field in a more general way and it was much more cautious, rich with reminders about the uncertainties and high costs that surrounded digital technologies, as they were revealed to us more than a decade ago.

In the years that followed, our small tug of war continued and I saw that Steve’s skepticism represented the conservatism that any preservation specialist ought to employ. I came to think of him as a digital Descartes, applying the great philosopher’s seventeenth century method of doubt to twenty-first century issues. And like Descartes, Steve mustered the best and newest parts of science (here: imaging science) to build a coherent and comprehensive digital practice.

He may have been a slightly reluctant digital preservation pioneer but without doubt he was a tremendous contributor whose passing is a great loss to friends and colleagues. — Carl Fleischhauer, Library of Congress digital format specialist and FADGI coordinator

Puglia’s ashes will be scattered in New Hampshire along a woodland brook that he loved. A fitting end for a photographer.

Great Brook on Flickr by mwms1916

Great Brook on Flickr by mwms1916

Categories: Planet DigiPres

Do You Have Digital Preservation Tools? COPTR Needs You!

17 December 2013 - 4:30pm

A few weeks ago, as part of the Aligning National Approaches to Digital Preservation conference, an announcement was made of the beta launch of a new resource to catalog and describe digital preservation tools:  Community Owned digital Preservation Tool Registry.coptr2

The idea behind this registry is to try and consolidate all of the digital preservation tool resources into one place, eliminating the need for many separate registries in multiple organizations.

As an example of how this will be useful, at NDIIPP we have our own tools page that we have maintained over the years.  Many of the tools on this list have either been produced by the Library of Congress or our NDSA partners –with the overall aim to provide these tools to the wider digital preservation community.  Of course, the tools themselves, or the links, change on a fairly regular basis; they are either updated or just replaced altogether. And, as our list has grown, there is also the possibility of duplication with other such lists or registries that are being produced elsewhere.  We have provided this to our users as an overall resource, but the downside is, it requires regular maintenance.  For now, our tools page is still available, but we have currently put any updates on hold in anticipation of switching over to COPTR.

COPTR is meant to resolve such issues of duplication and maintenance, and to maintain a more centralized, up-to-date, one-stop shop for all digital preservation related tools.

For ease of use, COPTR is presented on a wiki – anyone has access to this in order to add tools to the registry or to edit and update existing ones. Here’s how it’s described by Paul Wheatley, one of the original developers of this effort:

“The registry aims to support practitioners in finding the tools they need to solve digital preservation problems, while reducing the glut of existing registries that currently exacerbate rather than solve the challenge. (I’ve blogged in detail about this.)

COPTR has collated the contents of five existing tool registries to create a greater coverage and depth of detail that has to date been unavailable elsewhere. The following organisations have partnered with COPTR and contributed data from their own registries: The National Digital Stewardship Alliance, The Digital Curation Centre (DCC),  The Digital Curation Exchange (DCE), The Digital POWRR Project, The Open Planets Foundation (OPF)”

The above organizational list is not meant to be final, however.  Wheatley emphasizes that they are looking for other organizations to participate in COPTR and to share their own tool registries.

On the wiki itself, the included tools are grouped into “Tools by Function” (disk imaging, personal archiving, etc.) or “Tools by Content” (audio, email, spreadsheet, etc.)  According to the COPTR documentation, specific information for each tool will include the description and specific function, relevant URLs to the tool or resources and any user experiences. Generally, the tools to be included will be anything in the realm of digital preservation itself, such as those performing functions described in the OAIS model or in a digital lifecycle model. More specifically, the COPTR site describes in-scope vs. out-of-scope as the following:

  • In scope: characterisation, visualisation, rendering, migration, storage, fixity, access, delivery, search, web archiving, open source software ->everything inbetween<- commercial software.
  • Out of scope: digitisation, file creation

According to Wheatley, the goal is for organizations to eventually close their own registries and instead reference COPTR. The availability of a datafeed from COPTR provides a useful way of exposing COPTR (or subsets of the COPTR data) on their own sites.

This overall goal may sound ambitious, but it’s ultimately very pragmatic: to create a community-built resource that is accurate, comprehensive, up-to-date and eliminates duplication.

COPTR Needs You! To make this effort a success, the organizers are asking for some help:

And feel free to contribute feedback in the comment section of this blog post, below.

COPTR is a community registry that is owned by the community, for the community. It is supported by Aligning National Approaches to Digital Preservation , The Open Planets Foundation , The National Digital Stewardship Alliance, The Digital Curation Centre , The Digital Curation Exchange and the Digital POWRR Project.

Categories: Planet DigiPres

Just Released: Staffing for Effective Digital Preservation: An NDSA Report

16 December 2013 - 6:51pm

The following is a guest post by report co-authors and NDSA Standards and Practices Working Group members:

  • Winston Atkins, Duke University Libraries
  • Andrea Goethals, Harvard Library
  • Carol Kussmann, Minnesota State Archives
  • Meg Phillips, National Archives and Records Administration
  • Mary Vardigan, Inter‐university Consortium for Political and Social Research (ICPSR)

The results of the 2012 National Digital Stewardship Alliance Standards and Practices Working Group’s digital preservation staffing survey have just been released!  Staffing for Effective Digital Preservation: An NDSA Report (pdf) shares what we learned by surveying 85 institutions with a mandate to preserve digital content about how they staffed and organized their preservation functions. You may remember that The Signal blogged about the survey on August 8, 2012 to encourage readers to participate:  “How do you staff your Digital Preservation Initiatives?” As promised in that post and elsewhere, the results of the survey are now publicly available and the survey data have been archived for future use.

 An NDSA ReportWe’ll highlight some of the significant findings here, but we encourage you to read the full report and let us know what you think – both about the report and the current state of digital preservation staffing.

The NDSA found that there was no dedicated digital preservation department in most organizations surveyed to take the lead in this area.  In most cases, preservation tasks fell to a library, archive or other department. Close to half of respondents thought that the digital preservation function in their organizations was well organized, but a third were not satisfied and many were unsure.

Another key finding is that almost all institutions believe that digital preservation is understaffed.    Organizations wanted almost twice the number of full‐time equivalents that they currently had. Most organizations are retraining existing staff to manage digital preservation functions rather than hiring new staff.

The survey also asked specifically about the desired qualifications for new digital preservation managers.  Respondents believe that passion for digital preservation and a knowledge of digital preservation standards, best practices, and tools are the most important characteristics of a good digital preservation manager, not a particular educational background or past work experience.

Other findings from the survey showed that most organizations expected the size of their holdings to increase substantially in the next year. Twenty percent expect their current content to double. Images and text files are the most common types of content being preserved. Most organizations are performing the majority of digital preservation activities in‐house but many outsource some activities (digitization was the most common) and are hoping to outsource more.

The survey provides some useful baseline data about staffing needs, and the NDSA Standards and Practices Working Group recommends that the survey be repeated in two to three years to show change over time as digital preservation programs mature and as more organizations self‐identify as being engaged in digital preservation.

What do you think? We welcome your comments on the current report or any recommendations about the next iteration of the survey.

Categories: Planet DigiPres

Mapping the Movement of Books Using Viewshare: An Interview with Mitch Fraas

13 December 2013 - 4:07pm

Mitch Fraas, Scholar in Residence at the Kislak Center for Special Collections, Rare Books, and Manuscripts at the University of Pennsylvania and Acting Director, Penn Digital Humanities Forum, writes about using Viewshare for mapping library book markings.  We’re always excited to see the clever and interesting ways our tools are used to expose digital collections, and Mitch was gracious enough to talk about his experience with Viewshare in the following interview.

Offenbach Library Marks View, created by Mitch Fraas

Offenbach Library Marks View, created by Mitch Fraas.

Erin:  I really enjoyed reading about your project to map library book markings of looted books in Western Europe during the 1930s and 1940s.  Could you tell us a bit about your work at the University of Pennsylvania Libraries with this collection?

Mitch: One of the joys of working in a research library is being exposed to all sorts of different researchers and projects. The Kislak Center at Penn is home to the Penn Provenance Project, which makes available photographs of provenance markings from several thousand of our rare books. That project got me thinking about other digitized collections of provenance markings. I’ve been interested in WWII book history for a while and I was fortunate to meet Kathy Peiss, a historian at Penn working in the field, and so hit upon the idea of this project. After the war, officials at the Offenbach collecting point for looted books took a number of photographs of book stamps and plates and made binders for reference. Copies of the binders can be found at the National Archives and Records Administration and the Center for Jewish History. For the set on Viewshare, I used the digitized NARA microfilm of the binders.

Erin: I was particularly excited to see that you used Viewshare as the tool to map the collection. What prompted your use of Viewshare and why did you think it would be a good fit for your project?

Mitch: Viewshare really made this project simple and easy to do. I first heard about it through the library grapevine maybe a year and half ago and started experimenting with it for some of Penn’s manuscript illuminations. I like the ease of importing metadata from delimited files like spreadsheets into Viewshare and the built-in mapping and visualization features. Essentially it allowed me to focus on the data and worry less about formatting and web display.

An individual item record from the Offenbach View.

An individual item record from the Offenbach View.

Erin: You mentioned that these photographs of the book markings are available through NARA’s catalog and that CJH has digitized copies of albums containing photos of the markings. Could you talk a little about the process of organizing the content and data for your view. For example, what kinds of decisions did you make with respect to the data you wanted to include in the view?

Mitch: This is always a difficult issue when dealing with visualizations. Displaying data visually is so powerful that it can obscure the choices made in its production and overdetermine viewer response. There are several thousand book markings from looted books held by NARA and the CJH but I chose just those identified in the 1940s as originating from “Germany.” Especially when mapping, I worried that providing a smattering of data from throughout the collection could be extremely misleading and wanted as tight a focus as possible. Even with this of course there are still many holes and elisions in the data. For example, my map includes book stamps from today’s Russia, Czech Republic, Hungary and Poland. These were of course part of the Third Reich at the time but book markings from those countries are found in many different parts of the albums as the officers at the Offenbach depot sorted book markings had separate “Eastern” albums largely based on language – so for these areas the map definitely shows only an extremely fragmentary picture.

Erin: We’ve found that users of Viewshare often learn things about their collections through the different views they build – maps, timelines, galleries, facets, etc. What was the most surprising aspect of the collection you learned through Viewshare?

Mitch: I have to admit to being surprised at the geographic distribution of these pre-war libraries. Though obviously there are heavy concentrations in large cities like Berlin, there are also an enormous variety of small community libraries spread throughout Germany represented in the looted books. I didn’t get a real sense for this distribution until I saw the Viewshare map for the first time.

A cluster of Jewish libraries around Koblenz.

A cluster of Jewish libraries around Koblenz.

Erin: Your project is an interesting example of using digitized data to do cross-border humanities research. Could you talk about some of the possibilities and challenges of using a visualization and access tool like Viewshare for exchanging data and collaborating with scholars around the world?

Mitch: Thanks to what I was able to do with Viewshare I got in touch with Melanie Meyers, a librarian at the CJH, and am happy to say that the library there is working on mapping all of the albums from the Offenbach collection. The easy data structure for Viewshare has allowed me to share my data with them and I hope that it can be helpful in providing a more complete picture of pre-war libraries and book culture.

Erin: Do you have any suggestions for how Viewshare could be enhanced to meet the diverse needs of scholars?

Mitch: Though easier said than done, the greatest need for improvement I see in Viewshare is in creating a larger user and viewer base. The images I use for my Viewshare collection are hosted via Flickr which has much less structured data functionality but has a built-in user community and search engine visibility. In short, I’d love to see Viewshare get all the publicity it can!

Categories: Planet DigiPres

Can I Get a Sample of That? Digital File Format Samples and Test Sets

12 December 2013 - 3:36pm
These are my kind of samples! Photo of chocolate mayo cake samples by Matt DeTurck on Flickr

These are my kind of samples! Photo of chocolate mayo cake samples by Matt DeTurck on Flickr

If you’ve ever been to a warehouse store on a weekend afternoon, you’ve experienced the power of the sample. In the retail world, samples are an important tool to influence potential new customers who don’t want to invest in an unknown entity. I certainly didn’t start the day with lobster dip on my shopping list but it was in my cart after I picked up and enjoyed a bite-sized taste. It was the sample that proved to me that the product met my requirements (admittedly, I have few requirements for snack foods) and fit well within my existing and planned implementation infrastructure (admittedly, not a lot of thought goes into my meal-planning) so the product was worth my investment. I tried it, it worked for me and fit my budget so I bought it.

Of course, samples have significant impact far beyond the refrigerated section of warehouse stores. In the world of digital file formats, there are several areas of work where sample files and curated groups of sample files, which I call test sets, can be valuable.

The spectrum of sample files

Sample files are not all created equal. Some are created as perfect ideal example of the archetypal golden file, some might have suspected or confirmed errors of varying degrees while still others are engineered to be non-conforming or just plain bad.  Is it always an ideal “golden” everything-works-perfectly example or do less-than-perfect files have a place? I’d argue that you need both. It’s always good to have a valid and well-formed sample but you often learn more from non-conforming files because they can highlight points of failure or other issues.

Oliver Morgan of MetaGlue, Inc., an expert consultant working with the Federal Agencies Digitization Guidelines Initiative AV Working Group on the MXF AS-07 application specification has developed the “Index of Metals” scale for sample files created specifically for testing purposes during the specification drafting process which range from gold (engineered to be good/perfect) to plutonium (engineered poisonous).

An Index of Metals demonstrating a possible range of sample file qualities from gold (perfect) to plutonium (poisonous). Slide courtesy of Oliver Morgan, MetaGlue, Inc.

An Index of Metals demonstrating a possible range of sample file qualities from gold (perfect) to plutonium (poisonous on purpose). Slide courtesy of Oliver Morgan, MetaGlue, Inc.

Ideally, the file creator would have the capability and knowledge to make files that conform to specific requirements so they know what’s good, bad and ugly about each engineered sample. Perhaps equally as important as the file itself is the accompanying documentation which describes the goal and attributes of the sample. Some examples of this type of test set are the Adobe Acrobat Engineering PDF Test Suites and Apple’s Quicktime Sample Files.

Of course, not all sample files are planned out and engineered to meet specific requirements. More commonly, files are harvested from available data sets, web sites or collections and repurposed as de facto digital file format sample files. One example of this type of sample set is Open Planet’s Format Corpus. These files can be useful for a range of purposes. Viewed in the aggregate, these ad hoc sample files can help establish patterns and map out structures for format identification and characterization when format documentation or engineered samples are either deficient or lacking. Conversely, these non-engineered test sets can be problematic especially when they deviate from the format specification standard. How divergent from the standard is too divergent before the file is considered fatally flawed or even another file format?

Audiences for sample files

In the case of specification drafting, engineered sample files can be useful not only as part of a feedback loop for the specification authors to highlight potential problems and omissions in the technical language, but sample files may be valuable later on to manufactures and open-source developers who want to build tools that can interact with the file type to produce valid results.

At the Library of Congress, we sometimes examine sample files when working on the Sustainability of Digital Formats website so we can see with our own eyes how the file is put together. Reading specification documentation (which, when it exists, isn’t always as comprehensive as one might wish) is one thing but actually seeing a file through a hex viewer or other investigative tool is another. The sample file can clarify and augment our understanding of the format’s structure and behavior.

Other efforts focusing on format identification and characterization issues, such as JHOVE and JHOVE2, the National Archives UK’s DROID,  OPF’s Digital Preservation and Data Curation Requirements and Solutions and Archive Team’s Let’s Solve the File Format Problem, have a critical need for format samples, especially when other documentation about the format is incomplete or just plain doesn’t exist. Sample files, especially engineered test sets, can help efforts such as NARA’s Applied Research and their partners establish patterns and rules, including identifying magic numbers which are an essential component to digital preservation research and workflows. Format registries like PRONOM and UDFR rely on the results of this research to support digital preservation services.

Finally, there are the institutional and individual end users who might want to implement the file type in their workflows or adopt it as a product but first, they want to play with it a bit. Sample files can help potential implementers understand how a file type might fit into existing workflows and equipment, how it might compare on an information storage level with other file format options as well as help assess the learning curve for staff to understand the file’s structure and behavior? Adopting a new file format is no small decision for most institutions so the sample files allow technologists to evaluate if a particular format meets their needs and estimate the level of investment.

Categories: Planet DigiPres

Crossing the River: An Interview With W. Walker Sampson of the Mississippi Department of Archives and History

9 December 2013 - 3:02pm

W. Walker Sampson, Electronic Records Analyst, Mississippi Department of Archives and History

The following is a guest post by Jefferson Bailey, Strategic Initiatives Manager at Metropolitan New York Library Council, National Digital Stewardship Alliance Innovation Working Group co-chair and a former Fellow in the Library of Congress’s Office of Strategic Initiatives.

Regular readers of The Signal will no doubt be familiar with the Levels of Digital Preservation project of the NDSA. A number of posts have described the development and evolution of the Levels themselves as well as some early use cases. While the blog posts have generated excellent feedback in the comments, the Levels team has also been excited to see a number of recent conference presentations that described the Levels in use by archivists and other practitioners working to preserve digital materials. To explore some of the local, from the trenches narratives of those working to develop digital preservation policies, resources and processes, we will be interviewing some of the folks currently using the Levels in their day-to-day work. If you are using the Levels within your organization and are interested in chatting about it, feel free to contact us via our email addresses  listed on the project page linked above.

In this interview, we are excited to talk with W. Walker Sampson, Electronic Records Analyst, Mississippi Department of Archives and History.

JB: Hi Walker. First off, tell us about your role at the Mississippi Department of Archives and History and your day-to-day activities within the organization.

WS: I’m officially an ‘electronic records analyst’ in our Government Records section. It’s a new position at the archives so my responsibilities can vary a bit. While I deliver electronic records management training to government employees, I do most of my work in and with the Electronic Archives group here. This ranges from electronic records processing to a number of digital initiatives – Flickr, Archive-It and I think most importantly a reconsideration of our digital repository structure.

JB: What are some of the unique challenges to working on digital preservation within a state agency, especially one that “collects, preserves and provides access to the archival resources of the state, administers museums and historic sites and oversees statewide programs for historic preservation, government records management and publications”? That is a diverse set of responsibilities!

WS: It is! Fortunately for us, those duties are allocated to different divisions within the department. Most of the digital preservation responsibilities are directed to the Archives and Records Services division.

The main challenges here are twofold: a large number of records creators – over two hundred state agencies and committees, and following that, a potentially voluminous amount of born-digital records to process and maintain. I suppose however that this latter challenge may not be a unique to state archives.

I would also say that governance is a perennial issue for us, as it may be for a number of state archives. That is, it can be difficult to establish oversight for any state organization’s records at any given point of the life cycle. According to our state code we have a mandate to protect and preserve, but this does not translate into clear actions that we can take to exercise oversight.

JB: How have MDAH’s practices and workflows evolved as the amount of digital materials it collects and preserves has increased?

WS: MDAH is interesting because we started an electronic archives section relatively early, in 1996. We were able to build up a lot of the expertise in house to process electronic records through custom databases, scripts and web pages. This initiative was put together before I began working here, but one of my professors in the School of Information at UT Austin, Dr. Patricia Galloway, was a big part of that first step.

Since then the digital preservation tool or application ‘ecosystem’ has expanded tremendously. There’s an actual community with stories, initiatives, projects and histories. However, we mostly do our work with the same strategy as we began – custom code, scripts and pages. It has been difficult to find a good time to cross the river and use more community-based tools and workflows. We have an immense amount of material that would need to be moved into any new system, and one can find different strata of description and metadata formatting practices over time.

I think that crossing will help us handle the increasing volume, but I also think this big leap into a community-based software (Archivematica, DSpace and so on) will give us an opportunity to reconsider how digital records processing and management happens.

JB: Having seen your presentation at the SAA 2013 conference during the Digital Preservation in State and Territorial Archives: Current State and Prospects for Improvement panel, I was very interested in your discussion of using the Level of Digital Preservation as part of a more comprehensive self-assessment tool. Tell us both about your overall presentation and about your use of the Levels.

WS: I should start by just covering briefly the Digital Preservation Capability Maturity Model. This is a digital preservation model developed by Lori Ashley and Charles Dollar, and it is designed to be a comprehensive assessment of a digital repository. The intention is to analyze a repository by its constituent parts, with organizations then investigating each part in turn to understand where their processes and policies should be improved. It is up to the particular organization to prioritize what aspects are most relevant or critical to them.

The Council of State Archivists developed a survey based off this model, and all state and territorial archives took that survey in 2011. The intention here was to try and get an accurate picture of where preservation of authentic digital records stands across the country’s state archives.

This brings us to the SAA 2013 presentation. I presented MDAH’s background and follow-up to this survey along with two other state archives, Alabama and Wyoming. In my portion I highlighted two areas for improvement for us here in Mississippi, the first being policy and the second technical capacity.

Although the Levels of Digital Preservation are meant to advise on the actual practice of preservation, we have looked at the chart as a way to articulate policy. The primary reason for this is because the chart really helps to clarify at least some of what we are protecting against. That helps communicate why a body like the legislature ought to have a stake in us.

For example, when I look across the Storage and Geographic Location row of the chart, I’m closer to communicating what we should say in a storage section of a larger digital preservation policy. It’s easier for me to move from “MDAH will create backup copies of preserved digital content” to “MDAH will ensure the strategic backup of digital content which can protect against internal, external and environmental threats,” or something to that effect.

Second, I think the chart can help build internal consensus on what our preservation goals are, and what the basic preservation actions should be, independent of any specific technology. Those are important prerequisites to a policy.

Last, and I think this goes along with my second point, I don’t think policies come out of nowhere. In other words, while it strikes me that some part of a policy should be aspirational, for the most part we want to deliver on our stated policy goals. The chart has helped to clarify what we can and can’t do at this point.

JB: Using the Levels within a larger preservation assessment model is an interesting use case. What specific areas of the DPCMM did the Levels help address? The DPCMM is a much more extensive model and focuses more on self-assessment and ranking, whereas the Levels establish accepted practices at numerous degrees. What were the benefits or drawbacks of using these two documents together?

WS: Besides helping to demonstrate some policy goals, I think the Levels apply most directly to objectives in Digital Preservation Strategy, Ingest, Integrity and Security. There’s some significant overlap in content there, in terms of fixity checks, storage redundancy, metadata and file playback. When you look at the actual survey (a copy of this online somewhere…?), they recommend generally similar actions. I think that’s a good indication of consensus in the digital preservation community, and that these two resources are on target.

While I don’t think there’s a marked drawback to using the two documents together – I’ve haven’t spotted any substantive differences in their preservation advice where their subject areas overlap – one does have to keep in mind the more narrowed scope of the Levels. In addition, the DPCMM has the OAIS framework as one of its touchstones, so you find ample reference to SIPs, DIPs, AIPs, designated communities and other OAIS concepts. The Levels of Digital Preservation are not going to explicitly address those expectations.

JB: One aspect of the Levels that has been well received is the functional independence of the boxes/blocks. An individual or institution can currently be at different levels in different activity areas of the grid. I would be interested to hear how this aspect helped (or hindered) the document’s use in policy development specifically.

WS: I think it’s been very helpful in formulating policy. The functional independence of the levels lets the chart identify more preservation actions than it might otherwise. While some of those actions won’t ever be specifically articulated in a policy, some certainly will.

For example the second level of the File Formats category – “Inventory of file formats in use” – is probably not going to be expressed in a policy, levels 3 and 4 may though. It isn’t necessarily the case though that higher levels correlate to policy material however. For instance level 1 for Information Security is really more applicable to a policy statement than the level 4 action.

JB: One of the goals of the Levels of Preservation project is to keeps its guidance clear and concise, while remaining sensitive to the varied institutional contexts in which the guidance might be used. I would be interested to hear how this feature informed the self-assessment process.

WS: Similar to the functional independence, I think it’s a great feature. The Levels don’t present a monolithic single-course track to preservation capacity, so it doesn’t have to be dismissed entirely in the case that some actions don’t really apply. That said, I felt like really all the actions applied to us quite well, so I think we’re well within the target audience for the document.

The DPCMM really shares this feature. Although it’s meant to help an institution build to trustworthy repository status, it’s not a linear recommendation where an organization is expected to from one component section to the next. The roadmap would change considerably from one institution to the next.

Categories: Planet DigiPres

December Issue of Digital Preservation Newsletter Now Available

6 December 2013 - 5:22pm

The December 2013 issue of the Library of Congress Digital Preservation newsletter (pdf) is now available!

In this issue:Issue image

  • Beyond the Scanned Image:  Scholarly Uses of Digital Collections
  • Ten Tips to Preserve Holiday Digital Memories
  • Anatomy of a Web Archive
  • Updates on FADGI: Still Image and Audio Visual
  • Guitar, Bass, Drums, Metadata
  • Upcoming events:  CNI meeting, Dec 9-10; NDSA Regional meeting, Jan 23-24; Ala Midwinter, Jan 24-28; CurateGear, Jan 8; IDCC, Feb 24-27.
  • Conference report on Best Practices Exchange
  • Insights Interview with Brian Schmidt
  • Articles on personal digital archiving, residency program, and more

To subscribe to the newsletter, sign up here.


Categories: Planet DigiPres

Content Matters Interview: The Montana State Library, Part Two

6 December 2013 - 3:52pm
 Patty Ceglio

Diane Papineau. Photo credit: Patty Ceglio

This is part two of the Content Matters interview series interview with Diane Papineau, a geographic information systems analyst at the Montana State Library.

Part one was yesterday, December 5, 2013.

Butch: What are some of the biggest digital preservation and stewardship challenges you face at the Montana State Library?

Diane: The two biggest challenges seem to be developing the inventory system and appraising and documenting 25 years of clearinghouse data. MSL is developing the GIS inventory system in-house—we are fortunate that our IT department employs a database administrator and a web developer tasked with this work. The system is in development now and its design is challenging. The system will record not just our archived data, but the Dissemination Information Packages created to serve that data (zipped files, web map services, map applications, etc.) and the relationships between them. For data records alone, we’re wrestling with how to accommodate 13 use cases (data forms and situations), including accommodating parent/child relationships between records. Add to this that we are anxious to be up and running with a sustainable system and the corresponding data discovery tools as we simultaneously appraise and document the clearinghouse data before archiving.

We have archiving procedures in place for the frequently-changing datasets we produce (framework data). However, the existing large collection of clearinghouse data presents a greater challenge. We’re currently organizing clearinghouse data that is actively served and data that’s been squirreled away on external drives, staff hard drives, and even CDs. Much of the data is copies or “near copies” and many original datasets do not have metadata. We need to review the data and document it and for the copies, decide which to archive and which to discard.

When I think of the work ahead of us, I’m reminded of something I read in the GeoMAPP materials. The single most important thing GIS organizations can do to start the preservation process is to organize what they have and document it.

Screen shot of the Montana State Library GIS Archive.

Screen shot of the Montana State Library GIS Archive.

Butch: How have the technologies of digital mapping changed over the past five years? How have those changes affected the work you do?

Diane: The influence of the internet is important to note. Web programmers and lay people are now creating applications and maps using live map services that we make available for important datasets. These are online, live connections to select map data, making mapping possible for people who are not desktop GIS users. With online map makers accessing only a subset of our data (the data provided in these services), we note that they may not make use of the full complement of data we offer. Also, we notice that our patrons are more comfortable these days working with spatial databases, not just shapefiles. This represents a change in patron download data selection, but it would not affect our data and map protocols.

Technology gaining popularity that may assist our data management and archiving include scripting tools like Python. We anticipate that these tools will help us automate our workflow when creating DIPs, generating checksums, and ingesting data into the archive.

Butch: At NDIIPP we’ve started to think more about “access” as a driver for the preservation of digital materials. To what extent do preservation considerations come into play with the work that you do? How does the provision of enhanced access support the long-term preservation of digital geospatial information?

Diane: MSL is in the process of digitizing its state publications holdings. Providing easier public access to them was a strong driver for this effort. Web statistics indicated that once digitized, patron access to a document can go up dramatically.

Regarding our digital geographic data, we have a long history of providing online access to this data. Our current efforts to gain physical and intellectual control over these holdings will reveal long-lost and superseded data that we’ll be anxious to make available given our mandate to provide permanent public access. It may be true that patron access to all of our inventoried holdings may result in more support for our GIS programs, but we’ll be preserving the materials and providing public access regardless.

Butch: How widespread is an awareness of digital stewardship and preservation issues in the part of the geographic community in which the Montana State library operates?

Montana State Library GIS staff. Photo courtesy Montana State Library.

Montana State Library GIS staff. Photo courtesy Montana State Library.

Diane: MSL belongs to a network of professionals who understand and value GIS data archiving and who can be relied on to support our efforts with GIS data preservation. That said, these supportive state agencies and local governments may be in a different position with regard to accomplishing their own data preservation. They are likely wrestling with not having the financial and staff resources or perhaps the policies and administrative level support for implementing data preservation in their own organizations. It’s also quite likely that their business needs are focused on today’s issues. Accommodating a later need for data may be seen as less important. The Montana Land Information Advisory Council offers a grant for applicants wanting to write their metadata and archive their data. To date there have been no applicants.

Beyond Montana, I’ve delivered a GIS data preservation talk at two GIS conferences in New England this year. The information was well-received and engagement in these sessions was encouraging. Two New England GIS leaders with similar state data responsibilities showed interest in how Montana implemented archiving based on GeoMAPP best practices.

Butch: Any final thoughts about the general challenges of handling digital materials within archival collections?

Diane: By comparison to the technical hurdles a GIS shop navigates every day, the protocols for preserving GIS data are pretty straight forward. Either the GIS shop packages and archives the data in house or the shop partners with an official archiving agency in their state. For GIS organizations, libraries, and archives interested in GIS data preservation, there are many guiding documents available. Start exploring these materials using the NDSA’s draft Geospatial Data Archiving Quick Reference document (pdf).

Categories: Planet DigiPres

Content Matters Interview: The Montana State Library, Part One

5 December 2013 - 7:50pm

Diane Papineau. Photo credit: Patty Ceglio

In this installment of the Content Matters interview series of the National Digital Stewardship Alliance Content Working Group we’re featuring an interview with Diane Papineau, a geographic information systems analyst at the Montana State Library.

Diane was kind enough to answer questions, in consultation with other MSL staff and the state librarian, Jennie Stapp, about the MSL’s collecting mission, especially in regards to their geospatial data collections.

This is part one of a two part interview. The second part will appear tomorrow, Friday December 6, 2013.

Butch: Montana is a little unusual in that the geospatial services division of the state falls under the Montana State Library. How did this come about and what are the advantages of having it set up this way.

Diane: In addition to a traditional role of supporting public libraries and collecting state publications, the Montana State Library (MSL) hosts the Natural Resource Information System (NRIS), which is staffed by GIS Analysts.

NRIS was established by the Montana Legislature in 1983 to catalog the natural resource and water information holdings of Montana state agencies. In 1987, NRIS gained momentum (and funding) from the federal Environmental Protection Agency and Montana Department of Health and Environmental Sciences to support their mining clean-up work on the Superfund sites along the Clark Fork River between Butte and Missoula. This project generated a wealth of GIS data such as work area boundaries, contaminated area locations, and soil sampling sites, which NRIS used to make a multitude of maps for reports and project management. Storing the data and resulting maps at MSL made sense because it is a library and therefore a non-regulatory, neutral agency. Making the maps and data available via a library democratized a large collection of timely and important geographic information and minimized duplication of effort.

GIS was first employed at NRIS in 1987; from that point forward, NRIS functioned as the state’s GIS data clearinghouse, generating and collecting GIS data. NRIS operated for a decade essentially as a GIS service bureau for state government; during this period, NRIS grew into a comprehensive GIS facility, unique among state libraries. In fact, in the mid-1990s, NRIS participated in the first national effort to provide automated search and retrieval of map data. Today, beyond data clearinghouse activities, MSL is involved with state GIS Coordination as well as GIS leadership and education. We also are involved with data creation or maintenance for 10 of the 15 framework datasets (cadastral, transportation, hydrography, etc.) for Montana, and also host a GIS data archive, thanks to our participation as a full partner in the Geospatial Multistate Archive and Preservation Partnership (GeoMAPP)—a project of the National Digital Stewardship Alliance (NDSA).

Butch: Give us an example of some of the Montana State Library digital collections. Any particularly interesting digital mapping collections?

Diane: Our most important digital geographic collection is the full collection of GIS clearinghouse data gathered over the past 25 years. The majority of this data is “born digital” content made available for download and other types of access via our Data List. Within that collection, one of our most sought-after datasets is the Montana Cadastral framework—a statewide dataset of private land ownership illustrated by tax parcel boundaries. The dataset is updated monthly and is offered for download and as a web map service for desktop GIS users and online mapping. We have stored periodic snapshots of this dataset as it has changed through time and we also serve the most recent version of the data via the online Montana Cadastral map application. The map application makes this very popular data accessible to those without desktop GIS software or training in GIS. Another collection to note is our Clark Fork River superfund site data, which may prove invaluable at some point in the future.

In terms of an actual digital map series, our Water Supply/Drought maps come to mind. For at least 10 years now, NRIS has partnered with the Montana Department of Natural Resources and Conservation (DNRC) to create statewide maps illustrating the soil moisture conditions in Montana by county. DNRC supplies the data; NRIS creates the map and maintains the website that serves the collection of maps through time.

Butch: Tell us a bit about how the collection is being (or might be) used. To what extent is it for the general public? To what extent is it for scholars and researchers?

Diane: Our GIS data collection serves the GIS community in Montana and beyond. Users could be GIS practitioners working on land management issues or city/county planning for example. Other collections, such as our land use and land cover datasets and our collection of aerial photos, may be of particular interest to researchers.  The general public also utilizes this data; because of phone inquiries we receive, we know that hunters, for example, frequently access the cadastral data in order to obtain landowner permission to hunt on private lands. Though we don’t track individual users due to requirements of library confidentiality, we know that the uses for this collection are virtually limitless.

The general public can access much of the geographic data we serve by using our online mapping applications. For example, patrons can use the Montana Cadastral application that I mentioned plus tools like our Digital Atlas to see GIS datasets for their area of interest. They can use our Topofinder to view topographic maps online or to find a place when, for example, all that’s known is the location’s latitude and longitude. In 2008, in partnership with the Montana Historical Society, we published the Montana Place Names Companion—an online map application that helps patrons to learn the name origin and history of places across the state.

Butch: What sparked the Montana State Library to join the National Digital Stewardship Alliance?

Diane: While we’ve played host to this large collection of GIS data and we have long been recognized as the informal GIS data archive for the state, we had yet to maintain an inventory of our holdings. Thankfully, we never threw data out.

We realized that in order to gain physical and intellectual control over this collection of current and superseded data, we needed to modernize our approach. The timing couldn’t have been better because it coincided with the concluding phase of GeoMAPP.  In 2010 MSL participated as an Information Partner, beginning our exposure to formal GIS data archiving issues. Then in 2011, MSL joined GeoMAPP as the project’s last Full Partner. This partnership permitted us to envision applying archivists’ best practices while we reworked and modernized our data management processes.

In some ways we were the GeoMAPP “guinea pig” and we are grateful for that role—so much research had already been done by the other partners and so much information was already available. In return, what MSL could offer to this group was the perspective of three important GeoMAPP target audiences: libraries, archives, and GIS shops.

Butch: Tell us about some of the archiving practices that the Montana State Library has defined as a result of its partnership with GeoMAPP and the National Digital Stewardship Alliance. Why is preservation important for GIS data?

Diane: I’ll start with the “why.” GIS data creation is expensive. By preserving geographic data via archiving, we store that investment of time and money. GIS data is often used to create public policy. Montana has incredibly strong “right to know” laws so preserving data that was once available to decision makers supports later inquiry about current laws and policies. Furthermore, making superseded data discoverable and accessible promotes historically-informed public policy decisions, wise land use planning, and effective natural disaster planning to name just a few use cases. From a state government perspective, the published GIS datasets created by state agencies are considered state publications. Our agency is statutorily mandated to preserve state publications and make them permanently accessible to the public.

To guide us in this modernization, MSL developed data management standards, policies, and procedures that require data preservation using archivists’ best practices. I’ll discuss a few highlights from these standards that illustrate our particular organizational needs as a GIS data collector and producer.

In order to appeal to the greater GIS community in Montana, we decided to use more GIS-friendly terms in place of the three “package” terms from the OAIS model. We think of a Submission Information Package (SIP) as “working data,” a Dissemination Information Package (DIP) as a Published Data Package, and an Archive Information Packages (AIP), as an Archive Data Package.

MSL chose to take a “library collection development policy” approach to managing a GIS data collection rather than a “records management” approach, which makes use of records retention schedules. What this means is we’re on the lookout for data we want to collect—appraisal happens at the point of collection. If we take the data, we both archive it (creating an AIP) and make DIPs at the same time. The archive is just another data file repository, though a special one with its own rules. If the data acquired is not quite ready for distribution, we modify it from a SIP (our “working data”) to make it publishable. We do not archive the SIP.


Montana State Library Data Collection Management Flow

We’re employing the library discipline’s construct of series’ and collections and their associated parent/child metadata records, which is new to the GIS group here at MSL.  In turn, that decision influenced the file structure of our archive. Though ISO topic categories were GeoMAPP suggestions for both data storage as well as for data discovery, MSL chose instead to organize archive data storage by the time period of content unless the data is part of a series (i.e. cadastral) or if it was generated as part of a discrete project and is considered a collection (i.e the Superfund data). Additional consistency and structure should also come from the use of a new file naming convention (<extent><theme><timeframe>).

MSL is archiving data in its original formats rather than converting all data to an archival format (i.e. shapefile) because each data model offers useful spatial characteristics that we did not want to strip from the archived copy. For archive data packaging, we use the Library of Congress tool “Bagger” and we specifically chose to zip all the associated files together before “bagging” to save space in the archive. Zipping the data also permits us to produce one checksum for the entire package, which simplifies dataset management and dataset integrity checking in the workflow. We decided not to use Bagger’s zip function for this because the resulting AIP produced an excessively deep file structure, burying the data in multiple folder levels. To document the AIP in our data management system, we’ve established new archive metadata fields such as date archived, checksum, data format, and data format version.

Part two of this interview will appear tomorrow, Friday December, 2013.

Categories: Planet DigiPres

Happenings in the Web Archiving World

4 December 2013 - 6:11pm

Recently, the world of web archiving has been a busy one. Here are some quick updates:

  • The National Library of Estonia released the Estonian Web Archive to the public. This is of particular note because the Legal Deposit Law in Estonia allows the archive to be publicly accessible online. If you read Estonian you can browse the 1003 records that make up the 1.6 TB of data in the archive. A broad crawl of the entire Estonian domain is planned in 2014.
  • Ed Summers from the Library of Congress gave the keynote address at the National Digital Forum in New Zealand titled The Web as  Preservation Medium. Ed is a software developer and offers a great perspective into some technical aspects of preserving the Web. He covers the durability of HTML, the fragility of links, how preservation is interlaced with access, the importance of community action and the value of “small data”.
  • The International Internet Preservation Consortium 2014 General Assembly will be held at the Bibliothèque nationale de France in Paris May 19-13, 2014. There is still a little time to submit a proposal to speak at the public event on May 19th titled Building Modern Research Corpora: the Evolution of Web Archiving and Analytics.

IIPC_Logo_FullColorCall for Proposals announcement from the IIPC:

Libraries, archives and other heritage or scientific organizations have been systematically collecting web archives for over 15 years. Early stages of web archiving projects were mainly focused on tackling the challenges of harvesting web content, trying to capture an interlinked set of documents, and to rebuild its different layers through time. Institutions, especially those on a national level, were also defining their legal and institutional mandates. Meanwhile, approaches to web studies developed and influenced researchers’ and academics’ use of web archives. New requirements have emerged. While the objective of building generic collections remains valid, web archiving institutions and researchers also need to collaborate in order to build specific corpora – from the live web or from web archives.

At the same time, “surfing the web the way it was” is no longer the only way of accessing archived web content. Methods developed to analyse large data sets – such as data or link mining – are applicable to web archives. Web archive collections can thus be a component of major humanities and social sciences projects and infrastructures. With relevant protocols and tools for analysis, they will provide invaluable knowledge of modern societies.

This conference aims to propose a forum where researchers, librarians, archivists and other digital humanists will exchange ideas, requirements, methods and tools that can be used to collaboratively build and exploit web archive corpora and data sets. Contributions are sought that will present:

  • models of collaboration between archiving institutions and researchers,
  • methods and tools to perform data analytics on web archives,
  • examples of studies performed on web archives,
  • alternative ways of archiving web content.

Abstracts (no longer than one page) should be sent to Peter Stirling (peter dot stirling at bnf dot fr) by Friday December 9, 2013. Full details are available at the IIPC website.

Categories: Planet DigiPres

Digital Preservation Pioneer: Gary Marchionini

3 December 2013 - 8:27pm
Gary Marchionini

Gary Marchionini. Photo by University of North Carolina at Chapel Hill.

In 1971, Gary Marchionini had an epiphany about educational technology when he found himself competing with teletype machines for his students’ attention.

Marchionini was teaching mathematics at a suburban Detroit junior high school the year that the school acquired four new teletype machines. The machines were networked to a computer, so a user could type something into a teletype and the teletype would transmit it to the computer for processing.

The school teletypes accessed “drill and practice” programs. The paper-based teletype would print a math problem, a student would type in the answer, wait patiently for the response over the slow, primitive network and eventually the teletype would print out, “Good” (if it was correct).

“The thing was noisy,” said Marchionini. “But the kids still wanted to leave my math classroom to go do this in the closet. There was something about this clickety clackety paper-based terminal that attracted them.

“Eventually I realized that there were two things going on. One was personalization; each kid was getting his own special attention. The other thing was interactivity; it was back and forth, back and forth with the kids. It was engaging.

“That’s what sparked my interest in computer interaction as a line of research.”

That interest became a lifelong mission for Marchionini. He went on to get his Masters and Doctorate in Math Education and Educational Computing from Wayne State University, he quit teaching public school in 1978, joined the faculty at Wayne State and trained teachers in computer literacy.

In 1983, Marchionini joined the faculty at the University of Maryland College of Library and Information Services; he also joined the Human-Computer Interaction Laboratory.

“It was easy to make the transition from education to library and information services because I always thought of information retrieval as a learning function,” said Marchionini. “The goal of my work was always to enhance learning. And information seeking, from a library perspective… well, people are learning. It could be casual or it could be critical but they are trying to learn something new.”

Marchionini’s research encompassed information science, library science, information retrieval, information architecture and human/computer interaction…interface research. He was especially keen on the power of graphics to help people visualize and conceptualize information, and to help people interact with computers to find that information. In fact, as early as 1979, before the explosion of graphic interfaces on personal computers, Marchionini was coding rudimentary graphic representations on his own.

“One of my projects [in 1979] involved addition ‘grouping’ and subtraction ‘regrouping’ – borrowing and carrying and all that stuff,” said Marchionini. “I wrote a computer program that graphically showed that process as a bundling and unbundling of little white dots on a Radio Shack screen.”

Marchionini is quick to point out that graphics were only a part of his interface research, and there is a time and a place for graphics and for declarative text in human/computer interaction. He said that the challenge for researchers was to determine the appropriate function of each.

One interface project that he worked on at UMd also marked his first involvement with the Library of Congress: working with UMd’s Nancy Anderson, professor of psychology (now retired), and Ben Shneiderman, professor of computer science, to add touch screens to the Scorpio and MUMS online catalog interfaces. UMd’s collaborative relationship with the Library continued on into the American Memory project.

“They contracted with us at Maryland to do a series of training events on the user-interface side of American Memory,” said Marchionini. “We did a lot of prototypes. This is some of the early dynamic-query work that Ben Shneiderman and his crew and those of us in the Human Computer Interaction lab were inventing. We worked on several of the sub-collections.”

Marchionini’s expertise is in creating the underlying data architecture and determining how the user will interact with the data; he leaves the interface design — the pretty page — to those with graphic arts talent.

A lot of analysis, thought, research and testing goes into developing appropriate visual cues and prompts to stimulate interactivity with the user. How can people navigate dense quantities of information to quickly find what they’re searching for? What kind of visual shorthand communicates effectively and what doesn’t?

When an interface is well-designed, it doesn’t call attention to itself and the user experience is smooth and seamless. Above all, a well-designed interface always answers the two questions “Where am I?” and “What are my options?”.

Regarding his work on cues and prompts, Marchionini cites another early UMd/Library of Congress online project, the Coolidge-Consumerism collection.

“We wanted to give people ‘look aheads’ and clues about what might happen and what they were getting themselves into if they click on something,” said Marchionini. “The idea was to see if we can show samples of what’s down deep in the collection right up front, either on the search page or on what was in those days the early search-and-results page. It was a lot of fun to work with Catherine Plaisant and UMd students on that. We made some good contributions to interface design.” Marchionini and Paisant delivered a paper at the Computer-Human Interaction group’s CHI 97 conference titled, “Bringing Treasures to the Surface: Iterative Design for the Library of Congress National Digital Library Program,” which details UMd’s interface design process.

Marchionini has long had an interest in video as a unique means of conveying information. Indeed, he may have recognized video’s potential long before many of his peers did.

In 1994, he and colleagues from the UMd School of Education worked on a project called the Baltimore Learning Community that created a digital library of social studies and science materials for teachers in Baltimore middle schools.

Apple donated about 50 computers. The Discovery Channel offered 100 hours of video, which Marchionini and his colleagues planned to digitize, segment, index and map to the instructional objectives of the state of Maryland. It was an ambitious project and Marchionini said that he learned a lot about interactive video, emerging video formats, video copyrights and the programming challenges for online interactivity.

“We built some pretty neat interfaces,” said Marchionini. “At the time, Java was just coming out and we were developing dynamic query interfaces in the earliest version of Java. We were moving toward web-based applets. And we were building resources for the teachers to save their lesson plans, including comments on how they used the digital assets and wrote comments on them and shared them with other teachers. Basically we were building a Facebook of those days — getting these materials shared with one another and people making comments and adding to other people’s lesson plans so they could re-use them.”

Marchionini adds that the Baltimore Learning Community project is a good example of the need for digital preservation. Today, nothing remains from the project except for some printouts of screen displays of the user interfaces and website, and a few videotapes that show the dynamics.

“Today’s funding agencies’ data-management plan requirements are a step in the right direction of ensuring preservation,” said Marchionini.

In 1998, Marchionini joined the faculty at the School of Information and Library Science at the University of North Carolina, Chapel Hill, where he continued his video research along with his other projects. In 2000, he and Barbara Wildemuth and their students launched Open Video, a repository of rights-free videos that people could download for education and research purposes. Open Video acquired about 500 videos from NASA, which Open Video segmented and indexed. Archivist and filmmaker Rick Prelinger donated many films from his library to Open Video before he allied with the Internet Archive. Open Video even donated hundreds of videos to Google Video before Google acquired YouTube.

In 2000, around the time that NDIIPP was formed, Marchionini started discussing video preservation with his colleague Helen Tibbo and others. He concluded that one of the intriguing aspects of preserving video from online would be to also capture the context in which the video existed.

Marchionini said, “What kind of context would you need, say in 2250, if you see a video of some kids putting Mentos in Coke bottles and squirting stuff up in the air? You would understand the chemistry of it and all that but you would never understand why half a million people watched that stupid video at one time in history.”

“That’s where you need the context of knowing that this was the time when YouTube was happening and people were discovering ways to make their own videos without having to have a million dollar production lab or a few thousand dollars worth of equipment. The importance of it is that the video is associated with what was going on in the world at the time.”

With NDIIPP grant money, by way of the National Science Foundation, Marchionini and his colleagues created a tool  called ContextMiner, a sort of tightly focused, specialized web harvester that is driven by queries rather than link following. A user gives ContextMiner a query or URL to direct to YouTube, Flickr, Twitter or other services. In the case of YouTube, ContextMiner then regularly downloads not only the video files returned from the search but whatever data on the page is associated with that video. A typical YouTube page will have comments, ratings and links to related videos. For awhile, ContextMiner even harvested incoming links, which placed the video in a sort of contextual constellation of related topics.

The inherent educational value of video is that it can show a process. You can either read about how to juggle or how to tie your shoe laces, or you can watch a demonstration. Modelling communicates processes more effectively than written descriptions of processes.

Marchionini also sees video as a means of recording a process for research purposes. As an example, he described a situation where he wanted to capture and review the actions of users as they conducted queries and negotiated the search process.

He said, “I wanted to see a movie of a thousand people’s searches going through these states, from query specification to results examination and back to queries. Video is a way to preserve some things that have dynamics and interactions involved, things that you just can’t preserve in words. This is critical for showing processes, such as interaction dynamics, in a rapidly changing web environment. Because old code and old websites may no longer work, video is an important tool to capture those dynamics. That’s the only way I have of going back and saying, ‘Ten years ago, here were these interfaces we were designing and here’s why they worked the way they did.’ And I show a video.”

Today Marchionini is dean of the UNC School of Information and Library Science and he heads its Interaction Design Laboratory. The results of Marchionini’s research over the years have influenced our daily human/computer interaction in ways that we’ll never know.  Interfaces will continue to evolve and get refined but it is important to remember the work of people like Marchionini who did the early research and testing, labored on the prototypes and laid the foundation of effective human-computer interface design, making it possible for modern users to interact effortlessly with their devices.

Professors may not get the glory and attention that their work deserves but that’s not the point of being a teacher. Teachers teach. They pass their knowledge along to their students and often inspire them to create the Next Big Thing.

“University professors create ideas and prototypes and then the people who get paid to build real systems do that last difficult 10% of making something work at scale,” said Marchionini. “We train students. And it’s the students that we inspire, hopefully, who go on to industry or government work or libraries. And they put these ideas into place.

“My job is ideas and directions. Some stick and others do not. I hope they all get preserved so we can learn from both the good ones and the not-so-good ones.”

<<Digital Preservation Pioneer index

Categories: Planet DigiPres