The Signal: Digital Preservation

Subscribe to The Signal: Digital Preservation feed
The Signal: Digital Preservation
Updated: 2 hours 39 min ago

Before You Were Born: The Hardware Edition

12 November 2013 - 5:48pm

I increasingly deal with vintage hardware. Why? Because we have vintage media in our collections that we need to read to make preservation and access copies of the files stored on them.

I spend a lot of time thinking about hardware that I have interacted with and managed over the years. Some of it was innovative and exhibited remarkable adaptive uses, yet is sadly forgotten.

Original shipping box for a CueCat from Wired Magazine in 2000, photo by Leslie Johnston

Original shipping box for a CueCat from Wired Magazine in 2000, photo by Leslie Johnston

I cannot leave out the Telex, one of the earliest technologies to have a lasting effect on our practices today. Telex was networked telecommunications and teleprinting from 1933.

In the same vein, I think every archival professional knows something about the Memex, proposed by Vannevar Bush in his article “As We May Think” in The Atlantic in July, 1945. While this posited the use of early hypertext navigation, it was an interface to static microfilm.

The compact cassette – yes the cassette tape of our youthful mix tapes – was used for data storage on home computers in the 1970s and 80s.  I most vividly remember friends in high school seeking out cassettes with clear leaders for the loading of software and data on TRS-80 home computers.  Interestingly, cassettes may be coming back in a revived form as a storage medium.

I remember some of the early word processors, but one of the most interesting appears to be the DECmate from 1977, a PDP-8 compatible _desktop_ computer running word processing software, meant, according to its advertising, for “office workers.”

My colleague Jimi Jones quipped that any technology that ends in the word “-vision” should be on my list.  Catrivision analog video cassettes for consumer film distribution and for recording from 1972.  The Polavision instant movie camera from 1977. The Magnavox Magnavision laserdisc player from 1978.  The Selectavision Capacitance Electronic Disk video disc player from 1981.   The Fisher-Price PixelVision camera (with cassette storage) from 1987.

While writing this post I was introduced by my colleague Jerry McDonough to the short-lived Vectrex. A color, true 3D vector graphics display for home gaming in 1982. And gone from the market in 1984.

How about the GRiD Compass laptop from 1982? Rugged, with a graphical interface. It used bubble memory, very high capacity non-volatile memory for its day. And it was the first laptop to go into space. In 1991 GRiD introduced the GRiDPad SL, one of the first pen-based Tablets.

While writing this, a friend introduced me to the DECtalk, a text-to-speech synthesizer from 1984. It could work as an interface to an email system and had the capability to function as an alerting system by interacting with phone systems via touch tones.

While I never used one, I was fascinated by the description of the development of the Thunderscan, a hardware adapter with accompanying software to turn an Apple ImageWriter into a scanner, which hit the market in 1984.

I worked with a Sony Mavica digital camera in the mid-1980s. Yes, digital. While the first version was an analog signal, later versions, such as the one I worked with, were digital, and wrote onto floppy disks.

How many people remember the NeXT, introduced in 1988?  Perhaps it’s not fair to list this under hardware, because its OS, OpenStep object-oriented development tool, and WebObjects web application development framework were just as influential as the hardware, if not more. It was one of the earliest high-end workstations aimed at the scientific and higher education computer simulation market with fast chips and a lot of memory for the time, and magneto-optical storage, and it was truly WYSIWYG for layout and printing. You might remember that the first web browser was written by Sir Tim Berners-Lee on a NeXT and used it as the first web server…

The QuickCam was one of the first widespread consumer webcam devices in 1994, although neither the web nor videoconferencing were ubiquitous yet.  (Tangentially, my first real experience with videoconferencing was a job interview in late 1995).

In 1996 the Palm personal digital assistant appeared in the market. I had four different models over the years. It was one of the earliest devices to support syncing between email and calendars on both Windows and Mac systems, and had a touchscreen for gestural writing capture using its Graffiti writing system.  Of course it owed a huge debt to the Apple Newton from 1987, with its Notes, Names, and Dates applications and other productivity tools, and its true handwriting recognition.  I was also reminded by a colleague about the Sharp Wizard from 1989, with a memo pad, calendar and scheduling with alarms and repeating events, and a calculator.  I had completely forgotten that I once had one of these in my household when they were new.

I will end with one of my sentimental favorites. In 2000 I received an odd little box in the mail as part of my Wired magazine subscription. That box contained a CueCat, a home barcode scanner. It was meant to plug into home computers to read barcodes in print magazines to take you to targeted web sites.  It was described by PC World in 2006 as “One of the 25 Worst Tech Products of All Time.” Now of course we all have barcode readers in our phones to interact with barcodes and QR codes everywhere.  I still have my CueCat and the Wired box. And there are home library cataloging tools to this day that can still work with them.

What are your favorite forgotten innovations in hardware?

Categories: Planet DigiPres

November Library of Congress Digital Preservation Newsletter Now Available

8 November 2013 - 7:22pm

The November 2013 Library of Congress Digital Preservation Newsletter is now available!

In this issue:NovNews image

  • Digital Preservation Pioneer:  Sam Brylawski
  • Welcome NDSR Inaugural Class!
  • New Report:  Preserving.exe
  • Digital Portals to State and Community History
  • NDSA Report on Geospatial Data
  • Lists of Upcoming events and educational courses
  • Interviews with Edward McCain and Emily Gore
  • Articles on personal digital archiving, meetings reports, new resources, and more

Subscribe directly here, and get the newsletter automatically every month!

Categories: Planet DigiPres

A Logical Employment in Web Archiving

8 November 2013 - 1:13pm

The following is a guest post by Philip Ardery, the newest member of the Library’s Web Archiving team.

Philip Ardery. Photo by Susan Manus.

I can trace my interest in computers and technology back to a single factor of my childhood: my family’s perpetually faulty home internet connection. While my multitude of siblings continually cursed and physically writhed over the frequent network disconnects, my parents stood by powerless, unconscious even of how to turn our computer on—though they were swift in realizing that they could unplug the machine to turn it off. I quickly learned that, in order to end the madness, I had to figure out how to fix the thing myself.

Flashing forward a decade or so, it makes perfect sense that I found myself fresh out of college employed as a technical support analyst. But, if you back up a year or two, the logic begins to fail.

In 2010 I graduated from Kenyon College with a Bachelor of Arts degree in English. Despite having a continued interest in technology, the thought of pursuing a computer science degree had not even cross my mind. After graduating, however, and stepping out into the real world with my stylish yet less-than-accommodating liberal arts degree, I began kicking myself for not considering a more dynamic and practical degree four years earlier. Nonetheless, my natural inclination for technical problem solving eventually resurfaced as I began learning about and enjoying computers again through my employment as a support analyst at FICO, a job that afforded me a wonderful crash course in Unix-based operating systems.

My new role as an Information Technology Specialist with the Web Archiving team of the Library’s Office of Strategic Initiatives represents an exceptionally ideal opportunity for me. Not only does it appeal to both my literary background and my love of technology, but it also incorporates my third most notable life passion: the internet! Despite some of its more questionable quirks, I firmly believe that history will look back on the internet in an ultimately favorable light, as one of mankind’s greatest inventions. Consequently, I am ecstatic about this opportunity to work with some of the most influential leaders in the internet archiving community and to contribute my part to this outstanding Library of Congress initiative.

As the newest member of the Web Archiving team, I will focus on supporting large data transfers relating to the Library’s various collections of archived web content, contributing to the greater internet archiving community’s expanding standard of best practices, refining internal procedures to accomplish the team’s long-term goals more effectively and efficiently, while simultaneously providing a wide range of general troubleshooting reinforcement as needed. I greatly look forward to the challenges ahead of me and am eager to learn, contribute, and accomplish as much as I can in this outstanding work environment. I invite all of you to introduce yourself and let me know if I can help you with anything!

Categories: Planet DigiPres

Guitar, Bass, Drums, Metadata: Musical Context for Long-term Preservation

7 November 2013 - 4:21pm

Those of us in the “cultural heritage” sector get used to being at the end of the line sometimes. With very few exceptions, the unique items that end up in our collections usually get here after all their primary value has been extracted.

While we’d love to have a more regularized path for the treasures to get here, it’s actually to our benefit that creators and intermediaries have such strong incentives to steward and properly preserve their digital materials.


Metadata Madness wheel by user musebrarian on Flickr

This is especially true in the music industry, where artists and records labels are still struggling to turn their digital art into gold. Digital music files are valuable cultural artifacts in their own right, but before they become “artifacts” they’re valuable assets that need to be managed for the long-term in order to sustain their earning potential.

There are tremendous opportunities for the cultural heritage community to leverage existing digital music workflows and to engage with the music community to implement digital stewardship processes for the benefit of all.

The best way to do this is to tap into existing initiatives and processes for managing digital music data. Nothing is currently hotter in the technical side of the music biz than discussions on metadata. For example, a new Recording Academy initiative called “Give Fans the Credit” is an effort to brainstorm ways to deliver more robust crediting information on digital music platforms.

The preservation benefits of rich metadata have long been apparent to NDIIPP. Metadata projects made up a number of 2007’s Preserving Creative America projects, including the “Metadata Schema Development for Recorded Sound” project, which focused on creating a standardized approach for gathering and managing metadata for recorded music and developing software models to assist creators and owners in collecting the data. The project ultimately developed the Content Creator Data tool, an open-source application that captures metadata at the inception of the recording process.

The NDIIPP connections with the music industry don’t stop there. John Spencer of the MSDRS project, a current member of the National Digital Stewardship Alliance Coordinating Committee, is also a participant in the Music Business Association’s Digital Asset Management Workgroup. The workgroup is co-chaired by Paul Jessop, a former chief technology officer for the RIAA, and Maureen Droney of the Recording Academy, Producers and Engineers Wing, who joined us for a conference panel a couple of years ago.


Music metadata doesn’t look the same these days! Schumann’s “Erinnerung” by user pfly on Flickr

“I think there are two important on-going efforts that the music community is beginning to embrace,” said Spencer in a recent exchange. “One is that artists and performers are beginning to understand the importance of unique identifiers to define their ‘digital presence’ related to musical works. With the need to further automate the collection of royalties because of new delivery technologies, getting artists and performers to understand the importance of these identifiers is a place where the digital stewardship folks could help by showing examples of how they have implemented identifiers in their given space.”

The DAM group is working to “coordinate and standardize all non-recorded music assets relevant to the digital music value chain, such as artist images, credits, liner notes, archival assets, and more.” To that effect, they spearheaded last year’s publication of “MetaMillions: Turning Bits Into Bucks for the Music Industry Through the Standardization of Archival and Contextual Metadata.” The  paper looks at the current state of metadata collection and curation in the music industry and explores how the data is being shared at each stage of the lifecycle, with an emphasis on showcasing the sales and marketing rationale for a more standardized metadata framework.

The Producers and Engineers Wing will soon release an update of the “Recommendation for Delivery of Recorded Music Projects” (PDF).  This report “specifies the physical deliverables that are the foundation of the creative process” and “recommends reliable backup, delivery and archiving methodologies for current audio technologies, which should ensure that music will be completely and reliably recoverable and protected from damage, obsolescence and loss.”

More recently, the Music Business Association has hosted a Music Industry Metadata Summit and is working to expand the uptake of work being done by the Digital Data Exchange, a not-for-profit organization creating standards for the transmission of metadata between systems along the music supply chain. DDEX has established a working group focused on studio metadata, chaired by the aforementioned Mr. Spencer, with the release of specifications still to be determined (though we should note that they have already published a wide variety of other standards and recommendations).

This intense focus on metadata by the creation and intermediary management ends of the music industry should provide immense benefit to stewarding institutions once they ultimately take possession of the materials. Still, there are aspects of stewardship that may not be addressed by the current metadata efforts on the creation side, and the input of stewardship professionals could add lots of value.

So what are the most effective ways for the cultural heritage community to engage with the music community?

“Currently, I believe DDEX is a key piece of the puzzle,” said Droney in a recent conversation, “as it is the only organization working on actual standards for music business metadata. Standardization of the collection and transmission of recording studio metadata is the goal. In the meantime, educating the music community about best practices, both for the collection of credits and other technical and descriptive information, and for the short- and long-term archiving of masters, are important first steps. Also of note, the Audio Engineering Society has taken a serious interest in the National Recording Preservation Plan (PDF), and at the recent AES convention in NYC there were a number of tracks related to audio archiving and preservation that were inspired by the Plan.”

Categories: Planet DigiPres

The National Digital Newspaper Program Accepting Proposals: Apply Now!

6 November 2013 - 9:05pm

The following is a guest post by Leah Weinryb-Grohsgal, program officer in the Division of Preservation and Access at the National Endowment for the Humanities.


The Washington times. (Washington [D.C.]), 04 Oct. 1921. Chronicling America: Historic American Newspapers. Lib. of Congress.

The National Endowment for the Humanities is now accepting proposals for the National Digital Newspaper Program.  The National Digital Newspaper Program is a partnership between NEH and the Library of Congress to develop a searchable database of historically significant newspapers published in the United States.  The Library of Congress hosts the site for this project at Chronicling America, a collection of information and digitized newspapers published in the U.S. and territories between 1836 and 1922 available on the web for anyone to use.  The collection can now accept not only English titles, but Spanish, French, Danish, German, Hungarian, Italian, Norwegian, Portuguese and Swedish publications as well.

Each year, NEH and the Library of Congress seek to add more historic newspapers to the site, which currently includes more than 6.6 million pages and 1,100 titles.  Each award is made in the form of a cooperative agreement that establishes a partnership between NEH and the applicant institution, with technical support provided by the Library of Congress.  Awards support 2-year projects to digitize 100,000 newspaper pages from a state, primarily from microfilm negatives.  A list of the 36 institutions currently participating in the National Digital Newspaper Program may be found at

NEH hopes eventually to support projects in all states and U.S. territories.  One organization within each U.S. state or territory will receive an award to collaborate with state partners.  Previously funded projects are eligible to receive supplementary awards for continued work, but the program will give priority to new projects, especially those from states and territories that have not received NDNP funding in the past.  New applicants are welcome to propose projects involving collaboration with previous partners, which might involve an experienced institution managing the creation and delivery of digital files, consulting on the project or providing formal training to the project staff of a new institution.

US States awarded NDNP Grants, 2005-2013.

US States awarded NDNP Grants, 2005-2013.

NDNP projects focus on:

  • Selecting newspaper titles to be digitized and analyzing available microfilm for optimal scanning
  • Digitizing page images from microfilm, preparing optical character recognition files, and creating relevant metadata
  • Delivering files and metadata to the Library of Congress in conformity with technical guidelines
  • Updating bibliographic records of digitized titles in WorldCat
  • Identifying free open access newspapers in the state or territory for inclusion in the CA newspaper directory

Proposals are now being accepted from institutions wishing to participate in the National Digital Newspaper Program.  For more information, please visit the program’s funding page at, and technical guidelines at  Guidelines may be found at

Applications are due January 15, 2014.

Categories: Planet DigiPres

Anatomy of a Web Archive

5 November 2013 - 8:44pm

The following is a guest post by Nicholas Taylor, Web Archiving Service Manager for Stanford University Libraries.

I’m inclined to blame the semantic flexibility of the word “archive” for the fact that someone with no previous exposure to web archives might variously suppose that they are: the result of saving web pages from the browser, institutions acting as repositories for web resources, a navigational feature of some websites allowing for browsing of past content, online storage platforms imagined to be more durable than the web itself, or, simply, “the Wayback Machine.” For as many policies and practices guide cultural heritage institutions’ approaches to web archiving, however, the “web archives” that they create and preserve are remarkably consistent. What are web archives, exactly?

WARC , West African Research Center, by Robin on Flickr

WARC, West African Research Center, by Robin, on Flickr

At the most basic level, web archives are one of two closely-related container file formats for web content: the Web ARchive Container format or its precursor, the ARchive Container format. A quick perusal of the data formats used by the international web archiving community shows a strong predominance of WARC and/or ARC. The ratification of WARC as an ISO standard in 2009 made it an even more attractive preservation format, though both WARC and ARC had been de facto standards since well before then. First used in 1996, the ARC format is more specifically described by the Sustainability of Digital Formats website as the “Internet Archive ARC file format”, a testament both to the outsized contribution of the Internet Archive to the web archiving field as well as the recentness of the community’s broadening membership.

Anatomically, a WARC or ARC file can be thought of as a single document made up of a series of concatenated records. For the WARC format, these records can be one of eight different types, the most predictable of which represents an archived resource (e.g., html, JavaScript, image, video, Flash, etc.) retrieved from the web. Examples of other record types include crawler characteristics, http responses, http requests, resource capture details, pointers to previously-captured content (i.e., when crawler-based content de-duplication is enabled), alternate formats for previously-captured content (e.g., format obsolescence use case), and resources spanning multiple WARC files. Aside from the field designating the record type, there are three other mandatory fields found in the header of every WARC record: a record identifier, the record body size, and a timestamp.

This extensive technical metadata is what distinguishes a web archive from, say, a copy of a web page. Aside from testifying to the provenance and facilitating temporal browsing of the archived data, the variety and ubiquity of record headers also creates intriguing opportunities for metadata extraction and analysis.

Lego Bin, by Josh Hallett, on Flickr

Lego Bin, by Josh Hallett, on Flickr

As for the archived resources themselves, objects from different parts of the same website or multiple websites may be placed at random in one or more WARC files. The arbitrary packaging of harvested content facilitates parallelization of crawling and efficiencies in storing assets common to multiple sites (e.g., JavaScript libraries) but also explains the relatively slower load times of sites in the Wayback Machine; every single object that makes up the page must be unpacked from an arbitrary offset in many different files.

If you want to see for yourself, an appendix to the draft WARC specification contains examples of each of the WARC record types, including archived resources. Internet Archive also provides a set of test WARC files for download. Since even archived binary data is stored as (Base64-encoded) ASCII text, the files are surprisingly legible once unzipped and opened in a text editor. It’s not as seamless a way to navigate the past web as, say, Wayback Machine or Memento, but it will give a deeper understanding of the well-considered and widely-used data structure that makes those technologies work.

Categories: Planet DigiPres

Connecting Communities: FADGI Still Image Working Group’s Impact on the Library of Congress and Beyond

4 November 2013 - 3:31pm

The following is a guest post from Carla Miller of the Library of Congress. This is the second in a two-part update on the recent activities of the Federal Agencies Digitization Guidelines Initiative. This article describes the work of the Still Image Working Group. The first article describes the work of the Audio-Visual Working Group.

While attending a Federal Agencies Digitization Guidelines Initiative Still Image Working Group meeting earlier this summer, I suddenly saw everything come together. What I mean by that is I realized how the digital preservation work performed by my team at the Library of Congress intersects and relates to the work being performed by other divisions within the Library as well as other government agencies.

Participants at the meeting came from multiple agencies throughout the federal government and from various divisions within the Library of Congress. Participants included:

One current imaging project at the Library is the Rudimentum Novitiorum, a textbook for novice monks that explains the history of the world. The Library’s copy was printed in Germany in 1475. Photo by Carla Miller.

At the Library of Congress, Dr. Lei He is an imaging scientist who is currently researching the effects of compression on digital images. Dr. He also uses quantitative methods to analyze “edges” found in images. “Edges” are naturally occurring high contrast areas of photographs that can be used to determine what resolution is needed for digitization. Dr. He’s research is already improving the processes at the Library of Congress. Similar analyses done on the Farm Security Administration photo collection at the Library determined a higher scanning resolution was required for groups of negatives in the collection. This determination was especially significant because many historic negatives are deteriorating, which means this may be the last chance to digitize them for preservation and access.

Another type of research and testing is being done by Don Williams of Image Science Associates, an expert consultant for the Library of Congress. Don works with Steve Puglia and Dr. Lei He at the Library to develop software and image targets for assessing image performance. The software is known as DICE (Digital Image Conformance Evaluation), and using targets it analyzes the quality of the actual image capture to help determine both if the product quality expected is occurring and if that quality is consistent throughout the workflow. One important aspect of the DICE targets is that they are produced with spectrally neutral gray patches; many neutral patches on color/grayscale targets are not. A spectrally neutral target for transmissive materials (think photographic negatives rather than printed photos) is also in development.

Library staff member Ronnie Hawkins uses imaging software to verify the quality of the photos taken. It is extremely important that the images be true to the originals. Photo by Carla Miller.

The Library of Congress uses the DICE targets to test scanning equipment and to verify output quality. The DICE software is also used in quality assurance and quality control testing for digitization projects funded by the Library. This testing and analysis assures consistent quality across projects. It also ensures that the final product will be as true to the original as possible, an aspect that is often important for users of the Library’s digitized collections.

In a joint effort with the Government Printing Office and the National Archives and Records Administration, Library staff members have developed a matrix of file format comparisons. Five formats for still images were chosen for analysis: PNG, TIFF, JPEG, JPEG2000 and PDF. The group compared sustainability, and cost factors for implementation and storage. The final draft of this document will be available for public comment on the FADGI site within the next couple of weeks.

The research work being done at the Library benefits other Federal agencies as well. In fact, the entire purpose of FADGI is for Federal agencies to collaborate and share information and best practices on digitizing our various collections and records. Some examples of these collaborations were shared at our most recent meeting: Don Williams will be working with the National Anthropological Archives at the Smithsonian on the digitization of endangered manuscript materials. The Smithsonian will work with the Library on standardized language we use in contracts requiring the use of DICE targets as an objective measurement of scanning devices. And in a general sense, the research we do often informs the development of policies, protocols and workflows throughout the Library and various other agencies.


Categories: Planet DigiPres

Library of Congress Contributes Chapter to New Personal Digital Archiving Book

1 November 2013 - 5:38pm

Information Today recently published Personal Archiving: Preserving Our Digital Heritage, a collection of essays written by some of the leading practitioners, thinkers and researchers in the emerging field of personal digital archiving. We are honored that Information Today — and especially the book’s editor, Donald Hawkins — asked us to share our resources and experiences by contributing an essay to the book.

The term “personal digital archiving” can be interpreted in different ways, but I think it generally applies to digital preservation at the individual level as opposed to the institutional level. I say that the term “generally applies” because the concept of personal can be slippery to define.

Personal digital archiving could equally apply to individuals interested in securely saving their digital photos, families sharing and archiving all manner of born-digital and digitized memorabilia, local history and genealogy groups trying to deal with the increasing influx of digital material, public libraries acquiring non-commercial digital collections from the communities they serve and academics taking responsibility for the preservation of their digital professional works. So, for Personal Archiving: Preserving Our Digital Heritage, editor Donald Hawkins chose authors with a range of backgrounds and interests.

Summarizing the book might not do it justice, so here’s a quick look at the contents.

Brewster Kahl, visionary founder of the Internet Archives, wrote the introduction and he addresses personal digital archiving as an emerging societal phenomenon. “Excitement is growing as researchers learn from one another and welcome the type of sharing culture that comes before commercial players enter a field,” said Kahl.

Jeff Ubois, the founder of the annual Personal Digital Archiving conference, gives the informed, high-level view in his essay, “Personal Archives: What They Are, What They Could Be and Why They Matter.”

Danielle Conklin wrote, “Personal Archiving for Individuals and Families,” in which she examines the approaches that four different individuals take to their personal digital archiving projects.

I wrote “The Library of Congress and Personal Digital Archiving,” which summarizes the Library of Congress’s efforts to date: our print, video and audio resources; our outreach events and educational presentations to the general public and our collaboration with the Public Library Association to spread awareness of personal digital archiving resources into local communities. The essay also covers our general step-by-step advice for preserving personal digital valuables.

Editor Donald Hawkins wrote, “Software and Services for Personal Archiving,” in which he assesses media collection systems for photos and documents, notes, email archives and home movies and videos.

Evan Carroll, one of the leading experts in the complexity of digital-age estate planning, wrote “Digital Inheritance: Tackling the Legal Issues.”

Catherine Marshall, of Microsoft Research, wrote “Social Media, Personal Data and Reusing Our Digital Legacy.” Marshall specializes in objective research into what people actually do or don’t do with their digital stuff — human nature versus best practices.

Jason Zallinger, Nathan Freier and Ben Shneiderman co-wrote, “Reading Ben Shneiderman’s Email: Identifying Narrative Elements in Email Archives,” in which they analyzed 45,000 of Shneiderman’s emails for narrative elements.

Elisa Stern Cahoy wrote “Faculty Members as Archivists: Personal Archiving Practices in the Academic Environment.”

In “Landscape of Personal Digital Archiving Activities and Research,” author Sarah Kim goes into the kind of exhaustive comprehensive detail that only a PhD candidate can.

Aaron Ximm wrote “Active Personal Archiving and the Internet Archive” in which he details how the Internet Archive is already a public resource for personal digital archiving and he suggests some futuristic possibilities for the IA in actively capturing and preserving networked personal histories.

In “Our Technology Heritage,” Richard Banks of Microsoft Research details his philosophic and scientific observations about the intersection of the material and digital worlds, and their implications for next-generation technology.

Donald Hawkins, Christopher Prom and Peter Chan write about three interesting research projects in “New Horizons in Personal Archiving: 1 Second Everyday, myKive and MUSE.”

And appropriately, the book concludes with an essay from Clifford Lynch,  “The Future of Personal Digital Archiving: Defining the Research Agendas.” One of Lynch’s gifts is his ability to make sense of concepts like personal digital context within broader contexts — in the entire informational and cultural ecosystem — and extrapolate where things might evolve next. Lynch is one of academia’s great explainers.

Categories: Planet DigiPres

One Format Does Not Fit All: FADGI Audio-Visual Working Group’s Diverse Approaches to Format Guidance

31 October 2013 - 5:48pm

This is the first in a two-part update on the recent activities of the Federal Agencies Digitization Guidelines Initiative. This article describes the work of the Audio-Visual Working Group. The second article, to be published on November 4th 2013, describes the work of the Still Image Working Group.

Macroblocking: demolish the eerie ▼oid by Rosa Menkman on Flickr turns video errors into art.

I wish I had a quick and easy answer when colleagues ask what file format they should use to create and archive digital moving images. My response usually starts out with “well, it depends.” And indeed it does depend on a wide variety of factors. Factors like what they want to achieve with the file, what equipment and storage space is available, are they reformatting old videotapes or creating new born-digital material? The list of considerations that can impact the decision goes on. As a community, our general rule is to “make the best file that you can afford to create and maintain” but what makes one format better than another in a given situation?  (BTW: in this context, the term file format is understood to mean both the file “wrapper,” e.g. mov, avi, and mxf, and the encoding in the wrapper, e.g., uncompressed, H.264, and JPEG 2000.)

The Federal Agencies Digitization Guidelines Initiative Audio-Visual Working Group, with active members from across the Library of Congress including the Packard Campus for Audio-Visual Conservation and American Folklife Center as well as the National Archives and Records Administration, Smithsonian Institution Archives, and National Oceanic and Atmospheric Administration among others, has four subgroups working on informative guidance products to help answer the age-old question, “what should I do with my moving image collections?”

Video Efforts

The lead video effort, now in its third year, entails the development of a specification for the use of the MXF format, in effect a special profile of this wrapper tailored to serve preservation.  The specification is dubbed AS-07 and it is not only of general interest to the community but directly supports the work of the Packard Campus, where a version of MXF with JPEG 2000 picture encoding has been in use for several years.  Everyone expects that the publication of the AS-07 specification will increase the adoption of this format.  Meanwhile, however, there are other formatting options to consider, especially by smaller archives or for classes of content that are less complex than, say, the broadcast collections that are an important part of the Packard Campus holdings.

The Working Group’s interest in exploring this wider range of options has led to the formation of the Digitized Video subgroup, spearheaded by staff from the NARA’s Video Preservation Lab.  Taking a lead from the work of the FADGI Still Image Working Group, this subgroup is building a matrix to compare target wrappers and encodings against a set list of criteria that come into play when reformatting analog videotapes. The evaluation attributes include format sustainability, system implementation, cost, and settings and capabilities. The matrix and companion documents will be available for review on the FADGI website in the coming months.

The just-off-the-ground Born-Digital Video subgroup, led by staff from the American Folklife Center at the Library of Congress, is taking a lifecycle approach to born-digital video by focusing on guiding principles. Through visual examples and case histories, the subgroup’s product will illustrate the cause and effect of the range of decisions to be made during the creation and archiving lifecycle of a born-digital video file. This work will be geared for both file creators (such as videographers and others who create new digital video files) and file archivists (such as librarians and archivists and others who receive files from creators and have to archive and/or distribute them). For file creators, we want to emphasize the advantages of starting with high quality data capture. For file archivists, we want to explore options for identifying the composition of video files and evaluating their characteristics to better understand if action is warranted and if so, when the action needs to be taken.

Both the Digitized and Born-Digital video subgroup efforts build on the useful 2011 report by George Blood for the Library of Congress titled Determining Suitable Digital Video Formats for Medium-term Storage.  In addition, our format comparisons will support the ongoing work of the International Association of Sound and Audiovisual Archives as they draft a general guideline for video preservation.

Motion Picture Film Efforts

The Film Scanning subgroup, led by staff from NARA’s Motion Picture Preservation Lab, is addressing the issues of digitizing motion picture film. The first product from this group will be an outline of technical components to address when outsourcing film scanning to commercial vendors with a goal towards improving access. Other efforts, including the Academy of Motion Picture Arts and Sciences Academy Color Encoding System, are focused on improving archival master formats but until these efforts are ready for prime time, the community is looking for guidance on interim solutions that take possible future uses into account.

Categories: Planet DigiPres

Planning for Preservation Storage

30 October 2013 - 8:21pm

Every year the Library of Congress hosts a meeting on Designing Storage Architectures for Digital Collections, aka the Preservation Storage Meeting.  The 2013 meeting was held September 23-24, and featured an impressive array of presentations and discussions.

The theme this year was standards. The term applies not just to media or to hardware, but to interfaces as well. In preservation, it is the interfaces – the software and operating system mechanisms through which users and tools interact with stored files – that disappear the most quickly. Or change the least to keep up with changing needs.  The quote of the meeting for me was from Henry Newman of Instrumental, Inc:  “These are not new problems, only new engineers solving old problems.”

Designing Storage Architectures for Digital Collections 2013 Panel on Developments in Storage Media, photo by Michael Ashenfelder

Designing Storage Architectures for Digital Collections 2013 Panel on Developments in Storage Media, photo by Michael Ashenfelder

Library of Congress staff kicked off the meeting by discussing some of the Library’s infrastructure and needs. The Library has reached a point where it has 50% of the files in its storage systems inventoried, so we know what we have, where it is, who it belongs to, and have fixities for future auditing.  We have a wide range of needs, though, which vary with the type of content.  The scale of the data center where text and images are primarily stored is multiple millions of files in 10s of petabytes. The scale of the data center where video and audio are primarily stored is 700k files in 10s of PB. The different scales of file numbers and sizes mean different requirements for the hardware needed to stage and deliver this content. In terms of the Library’s storage purchases, 70% of the purchases are for the refresh of technology, 30% for capacity expansion.

Tape technologies are always a big topic at this meeting. T10K tape migration is ever ongoing. Interfaces to tape environments reach end-of-life and are unsupported within 5-10 years of their introduction, according to Dave Anderson of Seagate.  According to Gary Decad of IBM, rates of areal density increases are slowing down, and the annual rate of petabytes of storage manufacturing is no longer increasing.

Tape is by far and away the highest MSI (millions of square inches) of storage in production use. Tape, hard disk drive, and solid state storage are surface-area intensive technologies.  Many meeting participants believe that solid state improves Hard Disc Drive technology. Less obvious for preservation concerns is the impact of NAND flash storage on the use of hard drive storage. To replace enterprise hard disk drives will be exorbitantly expensive, and is not  happening any time soon.

Across the board there must be technologies licensed to multiple manufacturers and suppliers for stability in the marketplace. But it is extraordinarily expensive to build fabrication facilities for newer technologies such as NAND flash storage. The same is true for LTO  tape facilities, not as much for the expense of building the facilities but for the lack of profitability in manufacturing.  After the presentations at this meeting I more familiar with the licensing of storage standards to manufacturing companies than I was before, and the monopolies that exist.

The panel on “The Cloud” engendered some of the liveliest discussion. Three quotes stood out. The first, from Andy Maltz of the Academy of Motion Picture Arts and Sciences: “Clouds are nice but sometimes it rains.” And from Fenella France at the Library of Congress: “I have conversations with people who say ‘It’s in the cloud.’ And where is that, I ask. The cloud is still on physical servers somewhere.” And Mike Thuman from Tessella, referencing his slides, said “Those bidirectional arrows between the cloud and local? They’re not based on Kryder’s Law or Moore’s Law, it’s based on Murphy’s Law. You will need to bring data back. ”

David Rosenthal of Stanford University pointed out some key topics:

  • When is the cloud better than doing it yourself? When you have spiky demand and not steady use;
  • The use of the cloud is the “Drug Dealer’s algorithm”: The first one is free, and it becomes hard to leave because of the download/exit/migration charges;
  • The cloud is not a technology, it’s a business model. The technology is something you can use yourself.

Jeff Barr of Amazon commented, “I guess I am the official market-dominating drug dealer.” But Amazon very much wants to know from the community what it is looking for in a preservation action reporting system for files stored in the AWS environment.

The session on standards ranged from an introduction to NISO and the standards development process (with a wonderful slide deck based on clip art), to identifiers and file systems, and the specifics of an emerging standard: AXF.

A relatively new topic for this year’s meeting was the use of open source solutions, such as the range of component tools in OpenStackHTTP-based REST is the up-and-coming interface for files – the technology is moving from file system-based interfaces to object-based interfaces. Everything now has a custom storage management layer from the vendor.

Other forms of media were also discussed. Two of the most innovative are a stainless steel tape in a hermetically-sealed cartridge engraved with a laser, and another that is visually-recorded metal alloy media.  Optical media is also not dead.  Ken Wood from Hitachi  pointed out that 30-year-old commercial audio CDs are still supported in the hardware marketplace, and that CDs still play. Technically that has just as much to do with the software interface with error correction still being in play as the hardware still being supported. But mechanical compact disc players and storage are disappearing with the rise of mobile devices and thin laptops which have no optical players or hard discs.

Presentations by representatives of the digital curation and preservation community always make up a large percentage of this meeting. Projects such as the Data Conservancy and efforts at the Los Alamos National Labs, the National Endowment for the Humanities, the Library of Congress were featured. It was noted more than once that content and data creators still do not often feel that preservation is part of their responsibility. The key quote was “You can spend more time figuring out what to save than actually saving it. The cost of curation to assess for retention can be huge.”

You should really check out the agenda and presentations, which are available online.

Categories: Planet DigiPres

Looking for a Resource on Personal Digital Archiving?

29 October 2013 - 1:32pm

If so, you are in luck – we have a publication on that very subject.  “Perspectives on Personal Digital Archiving” was published and announced earlier this year, but I think it’s worth a reminder at this point, especially for those that may not have seen it yet.

So, why did we put this together?

Because we are generating more and more content about this topic on our blog, we compiled the relevant posts to make it easier to access all in one place.  Access being, of course, a crucial element in any digital preservation plan.

Many of our readers are already aware that personal digital archiving is, for better or worse, becoming a necessity in our time – that is, a time when more and more of our personal documents are in digital form.  Anyone who owns a digital camera, for example, has probably figured this out.  Remember those days of printed photos placed in photo albums or even stuffed into shoeboxes?  Now, since these items are in digital form, so too must be the storage, and eventually preservation, of those items.  You’d hate to see a treasured physical item become fragile and break apart, but many people are surprised to learn that items in digital form can be even more fragile.

There are many variables that can affect the preservation of a digital item – outdated equipment or software, inaccessible files, lack of backup, etc.  But luckily, there are steps you can take to make sure your documents, photos, and even email, all survive for even the next five years, or longer.

The process isn’t particularly complicated – see our specific advice here for digital photos.     But it does require some amount of focused effort to make sure that your treasured personal items are available for your own long-term enjoyment, as well as that of future generations.

“Perspectives on Personal Digital Archiving” contains much general information as well as interviews and step by step instructions, all of which can serve the novice as well as those who may already have some experience. You can access and download this publication (free of charge, of course) on our general personal digital archiving page as well as our publications page.

Here are the chapter headings along with a sample of what’s included:

Under “Personal Digital Archiving Guidance”

  • Four Easy Tips to Preserving Your Digital Photographs
  • Archiving Cell Phone Text Messages
  • What Image Resolution Should I Use?

Under “Personal Reflections on Personal Digital Archiving”

  • One Family’s Personal Digital Archiving Project
  • Personal Archiving: Year End Boot Camp
  • Forestalling Personal Digital Doom

Under “Personal Digital Archiving Outreach”

  • Librarians Helping Their Community with Personal Digital Archiving
  • What Do Teenagers Know About Digital Preservation? Actually, More Than You Think…
  • The Challenge of Teaching Personal Archiving

….and many more.

In the meantime, we are continuing to publish more blog posts all the time on personal digital archiving and related events.  There is also a section on our website devoted to information and resources on the subject.

As always, we welcome any feedback on this resource as well as your own stories and  experiences with personal digital archiving.


Categories: Planet DigiPres

Digital Stewardship and the Digital Public Library of America’s Approach: An Interview with Emily Gore

28 October 2013 - 7:18pm

Emily Gore, Director for Content at the Digital Public Library of America

The following is a guest post by Anne Wootton, CEO of Pop Up Archive, National Digital Stewardship Alliance Innovation Working Group member and Knight News Challenge winner.

In this installment of the Insights Interviews series, a project of the Innovation Working Group of the National Digital Stewardship Alliance, I caught up with Emily Gore, Director for Content at the Digital Public Library of America.

Anne:  The DPLA launched publicly in April 2013 — an impressive turnaround from the first planning meeting in 2010. Tell us how it came to be, and how you ended up in your role as content director?

Emily:  I started building digital projects fairly early in my career, in the early 2000s, when I was an entry-level librarian at East Carolina University. In the past, I’ve worked on a lot of collaborative projects at the state level. In North Carolina and South Carolina, I worked on a number of either small scale or large scale statewide collaborations. I led a project in North Carolina for a little over a year called NC-ECHO (Exploring Cultural History Online) and so have always been interested in what we can do together as opposed to what we can do on individually or on an institutional level. Standards are important. When we create data at our local institutions, we need to be thinking about that data on a global level.  We need to think about the power of our data getting reused instead of just building a project for every institution — which is where all of us started, frankly. We all started in that way. We thought about our own box first, and then we started thinking about the other boxes, right? I think now we’re beginning to think broader and more globally. It’s always been where my passion has been, in these collaborations, especially across libraries, archives, and museums.

I was involved in the DPLA work streams early on and saw the power of promise of what DPLA could be, and I jumped at the offer to lead the content development. At the time, I had taken an associate dean of libraries position and been at Florida State for about a year, and it was a real struggle for me to think about leaving, after only being somewhere for a year… but I think, I guess we have to take leaps in our life. So I took the leap, and you know, I think we’re doing some pretty cool things. We’ve come really far from when I started last September, really fast. I haven’t even been working on the project for quite year and we’ve already aggregated millions of objects and we’re adding millions more.

I love all the energy around the project and that a lot of people are excited about it and want to contribute. One of the first projects I coordinated was with a local farm museum, dealing with the actual museum objects, and marrying those with the rich text materials we had in the library’s special collections. And telling a whole story — people being able to actually see those museum objects described in that text. I just saw the power of that kind of collaboration from early on and what it could be more than just kind of a static, each-one-of-us-building-our-own-little-online-presence. The concept of the DPLA has really been a dream for me, to take these collaborations that have been built on the statewide, regional and organizational levels and expand them.

Image of DPLA homepage

Anne: There are ongoing efforts in lots of countries outside the United States to create national libraries, many of which have been underway since before the DPLA. Are there any particular examples you’ve looked to for inspiration?

Emily: Europeana, a multi-country aggregation in Europe, has been around for about five years now. We’ve learned quite a bit from them, and talked to them a lot during the planning phase. They have shared advice with us regarding things they might have done differently if given the opportunity to start again.  One particularly valuable piece of advice has been not to be so focused on adding content to DPLA that we forget to nurture our partnerships and to work with our users.  Of course, my job is largely focused on content and partnerships,  but we really want to make sure that the data we are bringing in to DPLA is getting used, that there are avenues for reuse, that people are developing apps, that we continue to make sure the Github code is updated, and that everything is open and we promote that openness and take advantage of showing off apps that have been built, encouraging other people (through hackathons, for example) to build on what we’ve got.

Europeana has also done a lot of work building their data model, and testing that data model, and making it work with their partners. That’s been a huge help for us starting off, to take their data model and adapt it for our use. They’ve also held rights workshops — Europeana formed 12 standardized rights statements starting with CC-0 and various Creative Commons level licensing, down to rights restricted or rights unknown. We all need to work with our partners to help them understand their rights and their collections better, and to place appropriate rights on them. Most of the collections we see coming in are “contact so-and-so,” “rights reserved,” that kind of thing. This is largely because people are afraid or there is a lack of history regarding rights. We want to work with Europeana and our partners to clarify rights regarding reuse for our end users.  Europeana has started to work with their partners on that, and we want to do that together, so that the rights statements are the same between organizations, and we promote interoperability in that way.

Anne:  So much of the DPLA is based on state hubs and the relationships that existing institutions have with those state hubs. How much collaboration do you see among the states?

[For uninitiated readers: the DPLA Digital Hubs Program is building a national network of state/regional digital libraries and myriad large digital libraries in the US, with a goal of uniting digitized content from across the country into a single access point for end users and developers. The DPLA Service Hubs are state or regional digital libraries that aggregate information about digital objects from libraries, archives, museums, and other cultural heritage institutions within its given state or region. Each Service Hub offers its state or regional partners a full menu of standardized digital services, including digitization, metadata, data aggregation and storage services, as well as locally hosted community outreach programs to bring users in contact with digital content of local relevance.]

Emily: When the DPLA working groups started to examine how we should go about getting content into the DPLA, I remember saying “We should build off of existing infrastructure, because these collaborative projects exist in many states.” They’ve been working with the local institutions for a number of years. So if we can start working with those institutions, then we can build a network and get content. Trust is so important. I think that the small institutions often trust that institution that’s been aggregating their content for a number of years and they might not trust someone from the DPLA coming in and and saying, “I want your content.”

The states work extremely well together. We have project leads and other relevant staff from each state or region, and five states and one region covering multiple states right now that we’re working with. We come together to talk about issues that are relevant to all of the states. The models are very different. Some of them have centralized repositories where the metadata work, the digitization work, everything is done in one central place. They work with partners to help provide initial data, and to get the actual objects, but then all the work is done centrally to enhance that metadata and do the digitization work. In other places it’s totally distributed. I’ll take South Carolina as an example. The three major universities in the state have regional scan centers, and they work with the people in their respective regions to get materials digitized, described and online. They’ll accept contributions from institutions who have already digitized their content and provided metadata for that, and then they’ll take it in to their regional repository, and then the three regional repositories are linked together to form one feed. It’s wonderful to hear the exchanges among the hubs, “this is what works in our state, and here are the reasons why.” And they figure out, “Maybe we’ll try this, maybe this will work better to attract folks.”

Anne: Have the state hubs helped build relationships with small institutions? Or how has the DPLA mission and reputation preceded it in these communities?

In several of the regions, because of the participation of the DPLA, people who refused to partner before are actually saying, “I want my content exposed through the DPLA so can we partner with you?” Partnerships are expanding in the hub states/region as a result of this. I think being at the national level is really helping. I think a lot of [the state hubs] are trying to do outreach and education — they’re doing webinars, they’re talking to people in their state, they’re trying to educate people about what the DPLA is and what the possibilities are. And trying to alleviate fears, where possible. There’s a lot of fear. Even opening metadata, it’s been interesting to see what people’s reactions to that are sometimes. I guess in my mind, I never thought about metadata having any rights. These states have had a challenge explaining what a CC0 license really means for metadata. I think that that has been a hurdle, but most of them are overcoming it, and partners in general are ok with it once they understand the importance of open data. They’re explaining why it’s important, and they’re talking about linked data and the power of possibility in a LOD world, and that that’s only going to happen if data is open.

Anne: How do you effectively provide context for these 4,000,000+ digital records? How do you root a museum artifact in the daily life of that place, and how do you do it within a given state versus across states?

Emily: We’ve done exhibitions of some of the content in the DPLA so far. We have worked with our service hubs to build some initial exhibitions around topics of national interest.  Our goal initially was for different states to work together to help provide data from multiple collections. That happened on a very small scale. Mostly the exhibitions were built with collections from their own institutions, largely because of time constraints we were under to get the exhibitions launched. But also, it’s easier. You know the curator down the hall, you can get permission to get the large-scale image that are needed to actually go in the exhibitions. We did have some exceptions to that; we had a couple of institutions work together and share images with the others. We hope to do more of that — we pulled out 40 or 50 themes of national significance that we could potentially build exhibitions around and there are a number of institutions who want to build more. Right now we’re working on a proposal to actually work with public librarians in several states, to reach some of the small rural public libraries that may have some collections that haven’t been exposed through the hubs, that would in turn help build some of these exhibitions at a national level. And those would be cross-state: local content into national-level topics of interest. We’re also doing a pilot with a couple of library schools on exhibition building. And we’ve given them the same themes, and they’re going to use content that already exists in the DPLA.

Anne: You mentioned hackathons and encouraging people to build things using the DPLA API. What are people building so far?

Emily: To date, I think there are approximately nine apps on the site. There is a cross-search between Europeana and the DPLA — a little widget app where you can search both at the same time and get results, which is awesome. That was built early on. Ed Summers built the DPLA map tool that automatically recognizes where you are so you can look at what DPLA content is available around you. The Open Pics app is iOS-based — you can search and find images around all the topics in the DPLA and use them on your phone. It’s pretty cool. Culture Collage is the same kind of app – it visualizes search results for images from the DPLA. StackLife is a way to visualize book material in a browsing way, like you would actually in the stacks in a library.

We also hope to continue to have hackathons, we’ve talked a little bit to Code for America and hoped to get more plugged in to their community, and we were involved in the National Day for Civic Hacking, and we’re hoping to continue to promote the fact that we do have this open data API that people can interface and build these cool apps with. We really want to encourage more of that.

Anne: Explain your vision for the Scannebago mobile scanning units.

Emily: When I was working in North Carolina years ago, we did a really extensive collections care survey of all the cultural heritage institutions in the state of North Carolina — about 1,000 institutions.

That survey took five years and two or three different cars! We surveyed these cultural heritage institutions looking specifically at their collections care and the conditions that their collection were in, but also with an eye toward what might need to be preserved for the long term, what needs to be digitized and made available, what are their gem collections that we could essentially help them expose? We saw so many amazing collections that, without physically going to these institutions, you would never ever see. Take the Museum of the Cherokee Indian as an example.

There we discovered wonderful textiles and pottery and other collections that, unless you physically go there, you will likely never see. And of course, like most museums, they only display a small portion of their collection at any time. Otherwise the collections are in storage, and on shelves, and until they rotate those collections in you never see them. It’s not only in North Carolina where we find those examples — it’s everywhere. The ability to see those objects online, I think, is so powerful. And even to potentially tell that rich contextual story, build exhibitions around that, talk about the important history there — I think can be very powerful. But we know that it took a trust relationship for us to even go there and survey their collections. There had to be a trust relationship built, instead of, “Hi, we’re from the state government and we’re coming here to survey your collections.” Obviously that is not really what a lot of people want to hear. So [during the North Carolina survey] we worked with cultural heritage professionals who had existing trust relationships with institutions and they helped us forge our own relationships.  In the end, most institutions were confident that we were indeed only there to survey the collections, and that we had good intentions to help get funding, to help preserve these collections for the long term.

We use that network a lot. We’re not going to get local content without the local people, without the connections, without the trust relationships that have already been built. These people aren’t going to let materials out of their building to be digitized. They’re not going to send them to a regional scan center, or a statewide scan center — they’re just not going to do that. They care about those objects so much — they represent their history, and in many cases they’re not going to let them out of their sight. We have to come to them — how do we do that? Some of these places are up these long winding mountain roads — how in the world do we get up here, and how in the world do we get equipment to them to get this done? That’s where I came up with the concept of a mobile digitization vehicle that I called a Scannebago, a Winnebago shell that we can build out with scanning/camera equipment to get to these rural and culturally rich institutions. That’s the concept.

People ask me about taking content directly into DPLA, and I think the importance is the sustaining of that content. Somebody has to be responsible for the long term maintenance of that content — and at this point, that’s not us. We’re aggregating that content, exposing that content for reuse, but we are not the long-term preserver of that content. And these small institutions are not the long-term preservers of that content either — that’s why the hubs model continues to be important. When we go out with the Scannebago, I still want that digital material to go to the hubs to be preserved for the long term. The Scannebago is another way to make content available with its appropriate metadata through the DPLA, but we really want to see the digital objects preserved and maintained for the long term at some level, and right now that’s through the hubs. It doesn’t have to be geography-based — hubs could be organized around media type or organization type.  But right now, a lot of these relationships exist already based on geography, so it seems logical to continue to build out hubs by geography as we build out other potential collaboratives as well.

The Scannebago has always been a dream — I had really hoped when I was working at the state of North Carolina that we’d be able to do it on some level, and it just didn’t become a reality — but John Palfrey (Head of School at Phillips Academy, Andover and chair of the DPLA board of directors) heard about what I wanted to do and picked it up and was really excited about the potential of doing this. We’re drawing out a schematic of what it would look like. We might potentially launch a Kickstarter campaign to try to build one out in the future. We really want to at least pilot the concept. I would also love to do a documentary on it — I think the stories we’ll find when we actually get to these places are just as important to preserve as the content — the curators, the people who are looking over this stuff and how important it is. I get chills just thinking about it, but one step at a time. One step at a time.

Categories: Planet DigiPres

39 And Counting: Digital Portals to Local Community History

25 October 2013 - 6:18pm

Given the popularity of 71 Digital Portals to State History from last month–we got many comments with great additions to that list–I thought it would be useful to extend the conversation to the local level. Unlike for the earlier post, we did not have the services of an intern to do the research, so the starting list is shorter. But we were able to quickly pull together the list below of 39 county, municipal and other local institutions that provide online access to unique digital resources useful for studying the history and culture of their communities.

The starter list was assembled with some basic parameters. State-wide and regional portals are left out, although it would make sense to include them in a master list. The previous blog post, along with the details provided by the Library of Congress web page State Digital Resources: Memory Projects, Online Encyclopedias, Historical & Cultural Materials Collections, do a good job covering state-level portals. The list also aims to exclude local collections that are bundled within a state-wide portal. A number of states have a done fine work coordinating with local institutions to put collections online, including Ohio MemoryWashington Rural Heritage and The State of Wisconsin Collection from the University of Wisconsin Digital Collections Center (just to name some). Scores of local institutions are represented through such portals. A more comprehensive list could, perhaps, specifically mention each of those entities.

The list is heavy with public libraries. Certainly other kinds of institutions are providing this kind of service as well and ultimately should be included in a more comprehensive accounting. On the other hand, the list does attempts to be broad in terms of collection size and coverage, as well as the level of technological sophistication used to to present content. The list emphasizes access, although it is to be hoped that all the institutions named are relying on current best practices for digital preservation.

I know there are many more worthy sites out there. Please let us know what is missing from this starter list. We will compile suggestions into a more complete list and make it available. Think of it as a crowdsourcing project!




AL Birmingham Public Library AR Fayetteville Public Library AZ Scottsdale Public Library CA Corona Public Library CA Escondido Public Library CA Los Angeles Public Library CA Pomona Public Library CA Sacramento Public Library CA San Francisco Public Library CO Denver Public Library DC DC Public Library FL Hillsborough Country Public Library Cooperative FL Manatee County Public Library FL Winter Park Public Library IA Iowa City Public Library IL Chicago Public Library IL Herrin City Library IN Allen County Public Library IN Indianapolis Public Library IN Jefferson Country Public Library MA Boston Public Library MA North of Boston Library Exchange (NOBLE) MN Hennepin County Library MO Kansas City Public Library MO Springfield-Greene County Library District NJ Plainfield Public Library NY Brooklyn Public Library OH Dayton Metro Library OH Public Library of Cincinnati and Hamilton County OK Tulsa City-County Library PA University of Pittsburgh/Historic Pitssburgh TN Knox County Public Library TN Nashville Public Library TX Houston Public Library VA Norfolk Public Library WA Everett Public Library WA Seattle Public Library WI La Crosse Public Library WI Milwaukee Public Library
Categories: Planet DigiPres

New NDSA Report: Issues in the Appraisal and Selection of Geospatial Data

24 October 2013 - 2:47pm

Digital mapping information is an essential part of the backbone of our economy through the now-widespread consumer applications that allow us to track our location, find a nearby restaurant or guide us on a journey. While largely invisible to the casual user, the geospatial data that underpins these applications allows us to know what our landscape looks like at this instant, and increasingly, to see where we’ve been at a level of previously unimagined detail.

While these tools are clearly incredibly valuable, it’s not entirely clear how we manage and preserve the equally valuable geospatial data that underpins the applications to ensure that we have the ability to analyze changes to our landscape over time.

With that in mind, we are pleased to announce the release of a new National Digital Stewardship Alliance report, “Issues in the Appraisal and Selection of Geospatial Data” (pdf).

The report has been a long time coming. It began life as a position paper released in advance of the Nov. 2010 Digital Geospatial Appraisal meeting held at the Library of Congress. The position paper was authored by Steve Morris, the Head of Digital Library Initiatives and Digital Projects at the North Carolina State University libraries, who has a long engagement with NDIIPP through his work on both the North Carolina Geospatial Data Archiving and GeoMAPP projects.

The latest iteration of the report took Mr. Morris’ work and applied the expertise of the Geospatial Content Team of the National Digital Stewardship Alliance to consider both appraisal and selection activities as they effect decisions defining geospatial content of enduring value to the nation.

The report provides an illuminating background on the problem area, then suggests ways to establish criteria for appraisal and selection decisions for geospatial data. It then proposes some models and processes for appraisal and selection, including tools for the identification and evaluation of data resources and triggers for appraisal and selection,  and finishes with further questions for the community to explore.

Now we’ve got some guidance. What are the next steps?

Categories: Planet DigiPres

Using Viewshare to Visualize Conference Tweets

23 October 2013 - 4:51pm

This is a guest post from Camille Salas, the former Viewshare outreach librarian extraordinaire. Camille completed her internship and temporary assignment with the Library of Congress recently, we hope to again feature her outstanding work as a guest author on this blog once she’s landed a new gig. Best of luck, Camille!

Following is an interview with Merinda Hensley and Thomas Padilla of the University of Illinois Library about their project to create a visualization of conference tweets. Merinda Hensley is the Instructional Services Librarian and Co-Coordinator of the Scholarly Commons at the University of Illinois at Urbana-Champaign. Thomas Padilla is a Graduate Assistant in the Scholarly Commons. Thomas recently created a tutorial on how to use Viewshare and ScraperWiki to capture and display tweets from conferences. The tutorial was posted to the Scholarly Commons blog, Commons­­­ Knowledge.

Merinda Hensley, University of Illinois at Urbana-Champaign

Merinda Hensley, University of Illinois at Urbana-Champaign

Camille: Please tell us about the Scholarly Commons at the University of Illinois at Urbana Champaign (UIUC). What is its role in the UIUC community?

Merinda: The Scholarly Commons, a unit of the Universityof Illinois Library, opened in August 2010 to serve the emerging needs of faculty, researchers and graduate students pursuing in-depth research and scholarly inquiry. We consult with researchers on issues related to data-intensive work, digital humanities, answer copyright questions, help with the digitization of materials, have workstations to support web and computer usability, and administer a series of open workshops on myriad topics geared towards the research needs of faculty and graduate students. We also partner with several campus constituencies in order to bring together services across campus. Our space offers software and hardware to conduct research including completing tasks such as text-encoding, working with qualitative and quantitative data, and digitization.

Camille: How do you each contribute to the work at the Commons?

Merinda: As Co-Coordinator of the Scholarly Commons, I work to support new and developing services and events related to the advanced research needs of graduate students and faculty. I coordinate our workshop series, the Savvy Researcher, and supervise the pre-professional graduate assistants from the Graduate School of Library and Information Science. Thomas has been an excellent addition to our team with his interests in technology and the humanities.

Thomas Padilla, University of Illinois at Urbana-Champaign

Thomas: As a Graduate Assistant at the Scholarly Commons, I work with a team of librarians and graduate assistants from the Graduate School of Library and Information Science to provide consultation services to patrons, monitor and learn new tools and resources, market Scholarly Commons services, teach workshops, and contribute to digital humanities projects.

Camille: Thomas created a short tutorial on using ScraperWiki and Viewshare to capture and display conference Twitterstreams for the Scholarly Commons blog. For those unfamiliar with ScraperWiki, please describe what it is.

Thomas: ScraperWiki is a web-based data scraping platform that helps novice and advanced users “scrape” publicly available data from websites like Twitter and Flickr. Novice users can use pre-made “tools” to scrape data and more advanced users can create their own data scrapers.

Camille: Other Viewshare users have described applications they used with Viewshare in previous blog posts. What led you to use ScraperWiki with Viewshare? What is the goal of using them together?

Thomas: Sharing and conversing about tools, resources, methods, and issues like professional ethics are often prominent themes in the conference Twitterstream. While some tools exist to capture this information, I was looking for a method that would capture, display, and make the data underlying the visualizations shareable. I chose ScraperWiki to capture the Twitter data because it offers an accessible method for a novice user to scrape Twitter data – I wanted individuals interested in trying this to face as low of an initial barrier of entry as possible.

I chose Viewshare because it offers a user-friendly interface, it enables spatial and temporal visualizations, it is maintained by a public institution, and it makes the data underlying visualizations shareable in multiple formats, thus affording the possibility for others to interpret, combine, and remix the data as they see fit.

Timeline Display from the DHOXSS 2013 Twitterstream

Camille: Briefly walk us through how you would use the two tools together.

Thomas: I use a pre-made ScraperWiki “tool” to scrape all Twitter data associated with a hashtag. After the data is scraped, it is downloaded in Microsoft Excel format. The next step is to clean the data and remove fields that contain data that is not well represented in the overall dataset.

Once the data is ready, the files are uploaded to Viewshare, data types are assigned to the various fields (columns from the spreadsheet), views are built, and widgets are assigned to add search and filtering functions.

Camille: What do you hope conference attendees and even non-attendees might glean from visualizing tweets? Similarly, how do you see projects like this assisting the work of researchers and scholars?

Thomas: By visualizing tweets, conference attendees and non-attendees are provided multiple ways of orienting themselves to topics that arise during an event and to the individuals who are contributing to those topics. The timeline visualization can help users to understand the frequency of tweets as the conference progresses.

This value can be enhanced by filtering the timeline according to factors like the language the tweet occurred in or by hashtag to focus on specific topics, tools, or resources. The map view can give users a sense of local and external participation in the discussion of a topic. Simple pie charts can be used to quickly understand which users tweeted the most, or at an international conference, what languages were used and what proportion of the overall conversation they represent.

While often maligned, the word cloud widget becomes useful in this context because words in the cloud filter the visualizations. Outliers in the word cloud are quickly apparent and the path toward finding the outlier tweets is short. These are just a few of examples of what is possible when using Viewshare to visualize Twitter data gathered by ScraperWiki.

As previous blog posts about Viewshare have shown (Bill Amberg, Jeremy Myntti, Violeta Ilik) the tool can support projects in many different domains and professional settings. While it can handle data in a few different formats, I think it has immediate appeal for a large group of researchers and scholars in its ability to visualize data held in the Microsoft Excel file format.

This is not an endorsement of the file format, rather it is just an acknowledgement that use of the file format is fairly ubiquitous across disciplines, and that lends itself well to a common starting point for many different scholars to use Viewshare to iterate through visualizations of their data, hopefully gaining useful perspective along the way.

Camille: What other visualizations or capabilities could Viewshare offer to enhance your project or future projects at the Scholarly Commons?

Thomas: I agree with Jeremy Myntti that it would be useful to be able to edit metadata within Viewshare rather than exporting, refining, and re-uploading. I think this feature would help users iterate through visualizations in a more streamlined way.

Camille: Have you received any feedback about using these tools together yet? Aside from the Common Knowledge blog post, how do you plan on sharing this tutorial with faculty and students at UIUC?

Viewshare Workshop Annoucement

Viewshare Workshop Annoucement

Thomas: So far we have not received much feedback on using the tools together, though we would definitely welcome it! My post on using Viewshare and ScraperWiki is the first in a three part series. The next post will feature Martin Hawksey’s Twitter Archiving Google Spreadsheet (TAGS) and the post after that will focus on the various insights that can be gained from using either tool. Aside from the blog, I will teach two Viewshare focused workshops this semester. While I will not be talking about combining Viewshare and ScraperWiki in those workshops, I will be focusing on how Viewshare can work with data from different disciplines.

Camille: Merinda, in your role supervising graduate assistants from the GSLIS, what types of skills do you find increasingly important for new librarians to have — especially for those who want to work in library units such as the Scholarly Commons? The tutorial seems like a good example of what new librarians can offer the field.

Merinda: Librarians new to the field can improve their chances of working in an environment similar to the Scholarly Commons by keeping up to date on how technology intersects with research in all disciplines. The Chronicle of Higher Education and Inside Higher Ed are both great places to watch for developments in all fields but getting to know the researchers in your field and the struggles they face is often the best way to begin thinking about how librarians can participate in the solution. We have to be flexible and willing to learn from our mistakes along the way as we adapt to new research strategies. Thomas’ work with Viewshare is an excellent example of exploring options for researchers who are increasingly depending on technology to support their research.

Categories: Planet DigiPres

Archiving Web Content? Take the 2013 NDSA Survey!

22 October 2013 - 2:54pm

The following is a guest post by Abbie Grotke, Library of Congress Web Archiving Team Lead and NDSA Content Working Group Co-Chair.


Not that kind of web! Spider Web by user .curt on Flickr

Are you or your employer involved in archiving web content? If so, you may be interested in the National Digital Stewardship Alliance’s (NDSA) 2nd biannual survey of U.S. organizations that are actively involved or planning to archive web content. In Fall 2011, the NDSA Content Working Group conducted its first survey of U.S. organizations who were doing or about to start web archiving. We blogged about the results of the survey here on the Signal, and published a report in 2012.

On the two-year anniversary of the original, the NDSA is releasing an updated survey to continue to track the evolution of web archiving programs in the U.S. Our goal in conducting these surveys are to better understand the U.S. web archiving landscape: similarities and differences in programmatic approaches, types of content being archived, tools and services being used, access modes being provided, and emerging best practices and challenges. As more institutions tackle web archiving, this type of information gathering and reporting not only raises awareness in the types of activities underway, but helps those preserving web content make the case for archiving back at their home institutions.

For those of you who took the previous survey, you’ll notice some key differences for the 2013 survey: more streamlined answers based on the free-text responses from the last survey, and an increased focus on policy.

As before, the aggregate responses will be reported to NDSA members and summary results will be shared publicly via this blog and elsewhere.

Any U.S. organization currently engaged in web archiving or in the process of planning a web archive is invited to take the survey – click on this link to get started. If you’d like to preview the survey before answering, we have a PDF of the questions available.

The survey will close on November 30, 2013.

I’d like to take a moment to thank some of my NDSA colleagues who stepped up during the government shutdown to help finalize and prepare the survey while the Library of Congress was closed — particularly Nicholas Taylor at Stanford University Libraries, Jefferson Bailey at METRO, Cathy Hartman from University of North Texas Libraries (and my co-chair on the Content Working Group), Kristine Hanna at the Internet Archive, and Edward McCain at the Reynolds Journalism Institute/University of Missouri Libraries. Thanks to these folks, we are able to get this survey launched today!

Categories: Planet DigiPres

Preserving.exe Report: Toward a National Strategy for Preserving Software

21 October 2013 - 1:50pm

Shelved Software at the Library of Congress National Audio-Visual Conservation Center

Our world increasingly runs on software. From operating streetlights and financial markets, to producing music and film, to conducting research and scholarship in the sciences and the humanities, software shapes and structures our lives.

Software is simultaneously a baseline infrastructure and a mode of creative expression. It is both the key to accessing and making sense of digital objects and an increasingly important historical artifact in its own right. When historians write the social, political, economic and cultural history of the 21st century they will need to consult the software of the times.

I am thrilled to announce the release of a new National Digital Information Infrastructure and Preservation Program report, Preserving.exe: Toward a National Strategy for Preserving Software, including perspectives from individuals working to ensure long term access to software.

Software Preservation Summit

On May 20-21 2013, NDIIPP hosted “Preserving.exe: Toward a National Strategy for Preserving software,” a summit focused on meeting the challenge of collecting and preserving software. The event brought together software creators, representatives from source code repositories, curators and archivists working on collecting and preserving software and scholars studying software and source code as cultural, historical and scientific artifacts.

Curatorial, Scholarly, and Scientific Perspectives

This report is intended to highlight the issues and concerns raised at the summit and identify key next steps for ensuring long-term access to software. To best represent the distinct perspectives involved in the summit this report is not an aggregate overview. Instead, the report includes three perspective pieces; a curatorial perspective, a perspective from a humanities scholar and the perspective of two scientists working to ensure access to scientific source code.

  • Henry Lowood, Curator for History of Science & Technology Collections at Stanford University Libraries, describes three lures of software preservation in exploring issues around description, metadata creation, access and delivery mechanisms for software collections.
  • Matthew Kirschenbaum, Associate Professor in the Department of English at the University of Maryland and Associate Director of the Maryland Institute for Technology in the Humanities, articulates the value of the record of software to a range of constituencies and offers a call to action to develop a national software registry modeled on the national film registry.
  • Alice Allen, primary editor of the Astrophysics Source Code Library and Peter Teuben, University of Maryland Astronomy Department, offer a commentary on how the summit has helped them think in a longer time frame about the value of astrophysics source codes.

For further context, the report includes two interviews that were shared as pre-reading with participants in the summit. The interview with Doug White explains the process and design of the National Institute for Standards and Technology’s National Software Reference Library. The NSRL is both a path-breaking model for other software preservation projects and already a key player in the kinds of partnerships that are making software preservation happen. The interview with Michael Mansfield, an associate curator of film and media arts at the Smithsonian American Art Museum, explores how issues in software preservation manifest in the curation of artwork.

The term “toward” in the title is important

This report is not a national strategy. It is an attempt to advance the national conversation about collecting and preserving software. Far from providing a final word, the goal of this collection of perspectives is to broaden and deepen the dialog on software preservation with the wider community of cultural heritage organizations. As preserving and providing access to software becomes an increasingly larger part of the work of libraries, archives and museums, it is critical that organizations recognize and meet the distinct needs of their local users.

In bringing together, hosting, and reporting-out on events like this it is our hope that we can promote a collaborative approach to building a distributed national software collection.

So go ahead and read the report today!

Categories: Planet DigiPres

DPOE Train the Trainer, Alaska Edition

30 September 2013 - 4:03pm

The following is a guest post by Jeanette Altman, a Digital Projects Professional at the University of Alaska Fairbanks.

For many Alaskans, it’s not uncommon to be just slightly out of step with the rest of America. Things that might be easily obtainable Outside (that’s the Lower 48 to you) come at a premium here. Free shipping? Not to Alaska!

So when some of us here at the University of Alaska Fairbanks’ Elmer E. Rasmuson Library first got wind of the Library of Congress’ Digital Preservation Outreach and Education program’s Train the Trainer workshops, we asked, “When are you expanding to include Alaska?” After receiving the disappointing but altogether unsurprising news that there were no such plans, George Coulbourne, Executive Program Officer at the Library of Congress, offered us an opportunity for a collaborative partnership. One year later, Train the Trainer, Alaska Edition, was born.

Now in its third year, DPOE seeks to foster national outreach and education about digital preservation, using a Train the Trainer model to reach as many people as possible.  Participants are trained in DPOE’s baseline curriculum, and then given the tools they need to build their own teaching network after they return to their communities.

 Catherine Williams

Participants in the Aug. 27-29 DPOE Alaska Train the Trainer Workshop. Photo Credit: Catherine Williams

The August 27-29 workshop in Fairbanks, Alaska, was hosted by the University of Alaska Fairbanks’ Elmer E. Rasmuson Library, and made possible by the generosity of the Alaska State Library and the Institute of Museum and Library Services.  Participants throughout the state of Alaska were flown in to Fairbanks for the three-day training. Twenty-four participants now join the growing network of 87 “topical trainers” across the United States, and are the first in the state of Alaska.

Rasmuson Library opened the application process to Alaska residents in May of 2013. Participants were flown in from various regions of Alaska such as Kotzebue, Igiugig, and Skagway, and represented a myriad of organizations including the National Park Service, Alaskan tribal libraries, cultural foundations, and various museums and libraries.

“I so enjoyed participating in the workshop, and feel invigorated by all that I learned over the three day event,” said Angie Schmidt, a workshop participant and film archivist with the Alaska Film Archives. “Being able to interact and form contacts with leaders from the Library of Congress and other institutions, as well as colleagues from around the state was especially valuable. The framework provided for initiating and carrying through on digital preservation projects will be so beneficial to us all in coming months and years.”

On the first day of the training, six groups were formed to focus on each of the DPOE modules: Identify, Select, Store, Protect, Manage and Provide. The diversity of the participant population was a valuable addition overall, as each group brought aspects of their cultural heritage and experience to their presentations.  The workshop provided time for networking and sharing of resources and experience, which has already led to further collaboration between Rasmuson Library and other state organizations. We hope that we can use this event as a starting point to find the right partners and funders to build out a digital preservation community in Alaska, including more Train the Trainer sessions, technical skills training, and investments in infrastructure.

“Alaska now has their first group of trained digital preservation practitioners,” Coulbourne noted in the event’s closing. “You all have the unique potential to collaborate across the state and use your newly acquired skills to enhance your communities’ efforts to preserve and make available the rich cultural heritage and treasures held by the Native Alaskan people.”

Robin Dale of LYRASIS, Mary Molinaro of the University of Kentucky and Jacob Nadal of the Brooklyn Historical Society continued their tradition of serving as lead or “anchor” instructors. Their generosity, their organizations’ commitment, and the Library of Congress focus on this national effort allowed the DPOE Train the Trainer Program to be offered to attendees from remote areas of Alaska who otherwise may not have been able to attend this critical skill building program in digital stewardship.

It was obvious to me that one of DPOE’s most valuable attributes is cost-effectiveness. The cultural heritage community needs quality training at a low cost. Digital preservation is a critical skill set, but training current staff is often too expensive for smaller institutions or states such as ours where accessibility to in-person training is very challenging if not impossible during certain times of the year.  This program has helped the Rasmuson Library staff to work with the state’s professional and Native Alaskan organizations to preserve our rich history, folklore, and traditions in digital form. I hope the community formed at this training event will raise the level of digital preservation practice, forge new partnerships, and bring more Alaskans and their valuable collections, up to speed with digital stewardship.

Categories: Planet DigiPres

Content Matters: An Interview with Edward McCain of the Reynolds Journalism Institute.

27 September 2013 - 4:22pm

For this installment of the Content Matters interview series of the National Digital Stewardship Alliance Content Working Group I interviewed Edward McCain, digital curator of journalism at the Donald W. Reynolds Journalism Institute and University of Missouri Libraries. Missouri University Libraries joined the NDSA this past summer.

Edward McCain. Photo by Jennifer Nelson/RJI.

Ashenfelder: What is RJI’s relationship to the Missouri University School of Journalism?

McCain: RJI is a sort of sister organization of the University of Missouri School of Journalism. We work closely with the faculty and staff there. The J-School produces the journalists of the future and RJI is a think tank that works to insure and help direct the future of journalism.

Ashenfelder: You said that one of the motivations for RJI joining the NDSA was the Columbia Missourian’s loss of 15 years of digital newspaper archives in a server crash. Can you tell us about that event and why this content is so important to preserve?

McCain: The Columbia Missourian is a daily newspaper operated by the University of Missouri School of Journalism that has served this mid-Missouri community since 1908.

According to 2006 and 2008 reports by Victoria McCargar, a 2002 Missourian server crash wiped out fifteen years of text and seven years of photos. The archive was contained in an obsolete software package that effectively prevented cost-effective retrieval. The content that was lost represents a kind of “memory hole,” albeit not the intentional variety described in Orwell’s “1984.”

The disappearance of 15 years of news, birth announcements, obituaries and feature stories about the happenings in any community represents a loss of cultural heritage and identity. It also has an effect on the news ecosystem, since reporters often depend on the “morgue”– newspaper parlance for their library–to add background and context to their stories.

In other parts of the information food chain, radio and television newscasts often rely on newspapers as the basis for their efforts. This, in turn, can have an effect on the democratic process, since the election process benefits from an accurate record of the candidates’ words and actions. All this lends credence to Washington Post Editor Phil Graham’s statement that journalism is “a first rough draft of history.”

Ashenfelder: You began your career as a photojournalist. How did you get into library science?

McCain: I earned my Bachelor of Journalism degree here at Mizzou and worked in the field for over 30 years, operating my own business for the past twenty. One of McCain Photography’s profit centers has been and continues to be the sale of stock photography, which is based on my image archive.

I eventually found myself reading about controlled vocabularies, databases, metadata and other library science concepts in my spare time. I enjoyed the challenge of structuring information in a way that adds value to content. One day I called the University of Arizona’s School of Information Resources and Library Science program, and was connected to Dr. Peter Botticelli. I asked him a lot of questions. That phone conversation, plus the fact that the SIRLS Masters degree could be combined with the Digital Information Management (DigIn) certificate program, helped me decide to take the leap back into academia.

Ashenfelder: And then you came back to Missouri and joined RJI. What do you bring to RJI as its new digital curator?

McCain: From my perspective, the most important qualities I bring are imagination,  the spirit of entrepreneurship and an ability to get things done. All human endeavors begin with a dream, the ability to visualize new possibilities. I’ve been a successful businessman, but more important is what I’ve learned over the years: the only failure is not owning your mistakes and learning from them so you can do better next time. To me, accomplishing things is often about having clear priorities and not caring who gets the credit; keeping egos (including my own) out of the way.

Those qualities, combined with my knowledge and experience as a journalist, photographer, software developer, businessman and library scientist all come into play in my new position. I’m still a bit amazed that MU Libraries and the Reynolds Journalism Institute created what I consider the perfect position for my skill set and interests at just the right time. And that as a result, I found my dream job.

Ashenfelder: The system you want to create will be able to archive the work of journalists from the newspaper, radio and TV. Can you broadly describe some of the requirements for such a system? What will it need to do in order to serve all of its stakeholders?

McCain: To be clear, we’re still in the embryonic phase of the software development process and we have a lot of research to do in terms of functional and technical requirements. It does seem likely that the framework will have to be modular, extensible and generally able to play well with others.

Obviously, the system will need to accommodate a wide range of file formats and packages during and across the processes needed during the life cycle of digital objects. I believe that we should be able to combine and build on existing open-source platforms to achieve this and more.

From early conversations with the three local media stakeholders, I imagine that that they are going to be focused on search functionality and speed. That means that they want to find relevant content quickly and access and integrate it into their workflow seamlessly.

We are going to spend quite a bit of time optimizing search and workflow issues but once we have a handle on those issues, there will be opportunities for collaboration within and between all three media outlets that will improve their efficiency and enhance the experience for their respective audiences.

Ashenfelder: One of your first tasks is to create a plan for such a system. What research are you doing as you develop that plan?

McCain: The problems surrounding preservation of and access to digital news archives stem from a combination of frequently changing factors. I’m employing an approach adapted from the Build Initiative, which has successfully produced change in the area of education.

The Build Initiative framework is based on change theory and focuses on five broad interconnected elements: context, components, connections, infrastructure and scale. Having this kind of framework allows me to keep the big picture in mind when making decisions.

For example, one of our components provides a new business model for digital news archives. In order to successfully support this service, we need to work in the infrastructure area to create the open-source software required to implement the new model. As in most real-life systems, there are many interconnections between these components. The key is to identify segments where positive outcomes in one realm can spread synergistically into others and continue to build on those successes.

Ashenfelder: You said you would like to share RJI’s system with other people, especially smaller towns and smaller institutions, so their history won’t be lost. Can you please tell us more about that?

McCain: Journalism is struggling to find sustainable and profitable business models. Print advertising revenue is less than half of what it was in 2006 and the number of newspaper journalists has declined by 27 percent since peaking in 1989. This is particularly true in smaller towns and rural areas. Once those businesses close their doors, there is an increased likelihood that its archives, especially those in digital formats, will be lost forever. That’s why I feel it imperative to address issues relating to current and future business models involving news archives.

By creating open-source software, we hope to offer these struggling enterprises new possibilities for generating revenues from their archives. For example, we can assist these organizations in setting up cooperative efforts that allow multiple archives to reside on a single server. That would keep costs low and participants would benefit from a larger pool of content, which is generally more attractive to potential customers, ranging from research services to individual users.

In addition, for those enterprises that don’t want to deal with setting up their own server or establish a co-op, we would like to leverage the efficiencies of the Missouri University IT system to provide our system as a service at an affordable cost.

Since humans tend to save what they value, we will prioritize our programs to support private enterprise’s ability to profit from their archives. Once those archives are seen as valuable assets, they will be preserved and accessed. But in cases where that outcome isn’t realized, part of our initiative involves working as an intermediary between news archive owners and cultural heritage institutions to facilitate the safe transfer of resources to an appropriate location.

Ashenfelder: There are potential opportunities for RJI to collaborate with other institutions, such as the Missouri Press Association and the State Historical Society of Missouri.

McCain: Interestingly, the State Historical Society of Missouri was established by the Missouri Press Association in 1898 and subsequently assumed by the state. They are both significant players in newspaper preservation and access.

I spoke to the MPA board a few weeks ago and found definite interest in working with RJI and the J-School to advance the cause of news archive preservation and access. I spoke with several publishers who expressed a willingness to experiment with our software and other services at an appropriate time in the process. SHS has been participating in the National Digital Newspaper Program since 2008 and has valuable experience in working with those and other analog and digital news collections.

Ashenfelder: Much of news content comes from businesses and the private sector. How do you intend to interest profit-oriented companies in RJI’s archive and repository?

McCain: My position is charged with preservation of and access to news archives, whether public or private. While the NDNP continues to do amazing things, there is a gargantuan amount of archival content in the private sector that we probably can’t address with public funding alone. This is one reason why, in its landmark 2010 report “Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information (PDF),” the Blue Ribbon Task Force on Sustainable Digital Preservation and Access stated the need to “provide financial incentives for private owners to preserve on behalf of the public.”

In light of current funding models for archives in the U.S., it makes perfect sense to work with people in the private sector to demonstrate the potential value of their archive and to assist them in realizing it. If news executives see archives as a profit center instead of a burden, my hope is that those resources will stay viable until they enter the public domain and can be accessed and preserved by other means.

News organizations are businesses and if decision-makers don’t see value in keeping their archives, they have little incentive to preserve them–or even donate them–given current laws that don’t incentivize such transfers to cultural heritage institutions. We plan to address those and other issues in the future by launching efforts in the Context component of our initiative.

Ashenfelder: Can you tell us more about the digital news summit that you are planning at RJI next spring?

McCain: In the spring of 2011, RJI, MU Libraries and Mizzou Advantage hosted the first Newspaper Archive Summit. My colleague Dorothy Carner, Head of Journalism Libraries, was instrumental in bringing together publishers, digital archivists, journalists, librarians, news vendors and entrepreneurs to begin a conversation about how best to approach the challenges with which we are currently presented.

Dorothy and I see the next part of that ongoing conversation as a kind of “break out” group focused on dialoging with decision-makers and their influencers in order to better understand their perspectives on access and preservation of archives. Undoubtedly, a large part of the next conversation will involve finding better ways to generate profits from archival resources.

In light of his recent purchase of The Washington Post, we’ve extended an invitation to Jeff Bezos, CEO of Amazon, to speak at the summit next April. I’m not sure he will attend but I think he’s a logical choice as a speaker for the following reasons.

1) It’s no accident that Mr. Bezos started Amazon by selling books, which is another word for content. By establishing relationships with book buyers, Amazon was able to access uniquely useful information about individual tastes and interests that could then be used to customize its marketing of all kinds of other merchandise.

2) Bezos used the Internet to develop a long-tail merchandising platform that could exploit low overhead in order to profit from even rarely ordered items. Most brick and mortar stores can only carry an inventory of high-volume merchandise because their overhead makes selling unpopular items prohibitively expensive. Combine these two effects and – voilà! Amazon becomes the world’s largest online retailer.

I invite you to take a moment to imagine you were Jeff Bezos and had just purchased a business with a lot of potentially valuable content cleverly disguised as a news archive. What would you do with it?


What kind of content matters to you? If you or your institution would like to share your story of long-term access to a particular digital resource, please email and in the subject line put “Attention: Content Working Group.”

Categories: Planet DigiPres

Society of American Archivists Awards ANADP conference paper with the 2013 Preservation Publication Award

26 September 2013 - 2:11pm

The following is a guest post from Michael Mastrangelo, a Program Support Assistant in the Office of Strategic Initiatives at the Library of Congress.

During the Society of American Archivists Annual Conference in New Orleans in August, the NDIIPP-supported initiative Aligning National Approaches to Digital Preservation (ANADP), received the prestigious  Preservation Publication Award for 2013. ANADP is a 327-page collection of peer-reviewed essays that establishes 47 goals and strategies to merge the efforts of national digital preservation efforts of nations throughout the European Union and the United States.

The Preservation Publication Award goes to outstanding preservation works, nominated by peers and reviewed by an SAA committee. SAA awarded this paper because it, “…broadens and deepens its impact by reflecting on the ANADP presentations,” and “…highlights the need for strategic international collaborations.” ANADP is written for information professionals from librarians to administrators, so it will have a broad impact on the whole information field, sparking cross-industry collaboration in addition to cross-border collaboration.

The honor goes to ANADP’s volume editor Nancy McGovern, the Head of Curation and Preservation Services at the MIT Libraries, series editor Katherine Skinner, the Executive Director of the Educopia Institute, and the section co-authors including representatives of the publications main sponsor, The Library of Congress, as well as experts from the Joint Information Systems Committee, Open Planets Foundation and other national and international organizations.

The ANADP conference was conceived from brainstorming sessions between the Educopia Institute, the Library of Congress, the University of North Texas, Auburn University, the MetaArchive Cooperative and the National Library of Estonia. In 2011, 125 delegates from 20 countries met in Tallinn, Estonia where they shared their national digital preservation practices. Delegates divided the work to create an overarching plan for furthering international collaboration by authoring a number of separate “alignments” across organizations, legal regimes, technical issues, economic approaches, standards and education.

The technical alignment panel discussed infrastructure like LOCKSS (Lots of Copies Keep Stuff Safe), while the organizational panel covered cost-efficiencies and vendor relations. The standards panel noted that many standards are just impractical or overly detailed making them inaccessible to smaller institutions. The copyright/legal panel mentioned the complicated laws on orphan works across jurisdictions, noting that conflicting copyright laws complicate preservation even across Europe’s fluid borders.

On the final day, the education panel stressed internships for bridging theory and practice, and George Coulbourne of the Digital Preservation Outreach and Education initiative suggested corporate partnerships to fund hands-on post-graduate development. Finally, the economics panel tackled the difficult question of shrinking budgets and identified successful funding models in projects like congressionally-funded NDIIPP, and JISC, a public charity with non-profit arms.

ANADP II is planned for November 18-20, 2013 in Barcelona. International digital stewardship leaders will reconvene to track progress toward collaboration and develop specific preservation actions for each collaborator to implement.

“I hope that we’ll delegate specific tasks to all the representatives to get the ball rolling on the action items in ANADP I,” said Mary Molinaro, the Associate Dean for Library Technologies at the University of Kentucky and a member of the DPOE Steering Committee. “We created an exciting plan for international collaboration with that first publication, now we just need to execute it.”

Categories: Planet DigiPres