The Signal: Digital Preservation
In The Is of the Digital Object and the is of the Artifact I explored the extent to which digital objects confound and complicate some of our conceptions of what exactly digital things are. I’m becoming increasingly convinced that the nature of digital objects offers an important opportunity for the cultural heritage community to consider how some of our core philosophies work when the nature of our the objects we work with changes. If we step back from the representations of digital objects on the screen, and think about them as sequences of bits that exist on particular mediums I think some core archival principles have much to offer us.
Media/Medium as Fonds
Whatever your feelings about the imperative to Respect Des Fonds it is a corner stone of the identity and professional practice of archives. Attempting to maintain the original order in which materials were managed before being accessioned and making decisions when processing an archive with respect to the whole both suggest a kind of archeological or paleontological understanding of documents, records and objects. An Objects meaning is always to be understood in context of the objects near it and the structure it is organized in.
To what extent do the bits on a medium’s relationship to other bits on that medium represent a parallel kind of context? The structure and organization of records and knowledge says as much about the materials as what is inside them. The layers of sediment in which something is found enables you to understand it’s relationship to other things. Context is itself a text to be read.
The Original Order of Bits
All digital objects actually exist first as analog objects, as bits encoded on a particular medium. At that level the bits that exists on a particular medium have an original order to it. At the moment of accession, there is a liner set of bits on any particular media. The hardrive, the optical disk, the 5 inch floppy, each have on them an ordered set of bits that can be copied and made sense of by various technologies, both today and in the future. This is similarly true of the file level. Setting aside the actual physical arrangement of individual bits on a disk each file is composed of a sequence of ones and zeros which come with a order. The fixity check tells us if this order has been altered.
Normalization is Interpretation
When we decide to normalize, or to copy only the representations of digital objects as represented when rendered in particular situations, we are effectively disregarding the original order of the bits on the media. If you copy over the directory structure of files we can still preserve a good bit of the user perceived order of digital context. We can see what things were next to each other in the metaphorical and iconographic folder on someone’s desk top. However, those representations are still (in a sense) translations. They are particular ways of seeing and understanding the underlying bits.
Beyond this, any attempts to normalize files themselves, to derive other kinds of files is a much deeper disregard for the ideal of respecting the integrity, order and structure of digital objects. In this case, even the screen essentialist notion of the digital object is in question. Each of these moves to normalize, each of these transformations an degradations moves us one step further away from bit level fixity and authenticity, from the authenticity of the fixity check, and toward a kind of performance or restaging of the artifact. We get further and further from being able to assert that what we have is exactly what we were given. We become artists engaged in a performative interpretation or recreation of the artifact.
The Order and Logic of Digital Media
But why are we even talking about order? Wasn’t the entire point of the digital the end of linearity? Our experience of digital media is one of non-linearity. The first row of the database or the spread sheet is reorganized based on parameters. The web is made of a linked pages and created from a rhizomic network of connections between nodes. While the representations of digital objects often appear non-linear it is critical to not be seduced by the flickering and transitory view of digital objects provided by our screens. At the end of the day, every digital object is encoded on some medium and that encoding is an ordered sequence of bits.
Letting go of representations and embracing the bits
To try and bring this whole discussion back from theory and into practice, when recently working with a set of files from floppy disks in a collection I came across a set of files I couldn’t open. The extensions made no sense to me or anything else for that matter. I changed the extensions to .txt and opened them in a text editor. Lo and behold, they were mostly made of characters that my computer could interpret as text. I didn’t need to know what format the files were in to be able to make sense of most of their contents. I didn’t need a secret decoder ring. I could just tell my computer to pretend this particular sets of bits we call a file is all text and show me what it sees.
This isn’t just true for files with text in the. While you might not be able to play the disk image of a game, anyone can crack it open and look at the text files, various script files, texture files, audio files (in the order they exist) and understand them in context. Even the metaphorical folder names inside that disk image tell us about what is there.
Computers and software become the tools we can use to make sense of the stratigraphy of the disk, to interpret the order of bits. Imaging disks (logical or forensic) attends to that order.
This is a Guest Post by Abbie Grotke, the Library of Congress Web Archiving Team Lead and Co-Chair of the National Digital Stewardship Alliance Content Working Group.
We’re excited to finally announce something a team of Library staff has been involved with for over a year now – a big project to integrate the Library’s web archives into the rest of the loc.gov web presence. It’s the new “Archived Web Site” format type and is a part of the Library’s main search function.
Part of an ongoing effort to better enable patrons and researchers to find and use our online materials more easily, this update provides, for the first time, a way for the Library’s web archive content to be searched alongside other formats such as books, photos, maps, periodicals and more.
This “soft launch” (or “we’re putting it out there but still have improvements to make”) includes content from eight of our publicly accessible web archives. All eight collections and others remain available at the Library of Congress Web Archives home (we won’t take that site down until all content has been migrated over). We have plans for more content to be released later this year, including some archived sites not already on the LCWA site.
Some of the features of this new release:
- Searching web archives directly from the Library’s homepage: When searching the larger loc.gov site, you can now select “Archived Web Site” as a facet to search archived sites, rather than going to a separate interface.
- Searching archived websites along with other Library content: You will now find records for our web archives intermixed with related content if you do a search across all formats. For instance, if you search the Library’s site from the home page for “Ann Telnaes” you not only see content from our Prints and Photographs division, but also an archived site that was captured as a part of a 2006 P&P project.
- Faceted searching: Users can narrow search results by date, collection name, contributors, subjects, locations and languages.
- Combined records: Previously, we had no elegant way to handle when a URL had changed or if a URL belonged to multiple collections — we usually had more than one catalog record and these were not linked. Now, these records combine to make it easier to find all of the content collected for any given web presence over time.
- Thumbnail browsing and featured items: We’ve now got thumbnail images for all of our archived sites (taken from the first date captured), which are integrated into the item records and in the search screens. Our content curators have also selected featured items for each of the eight collections, which are available on the collection pages.
- New viewer: Staff working on the project wanted to push the boundaries a bit and try some new ways of accessing and viewing the archived content, beyond what we’d been doing with access via our catalog records to the Wayback calendar view that most users of web archives are familiar with; so you’ll see a new viewer with this release that displays content from our Wayback Machine but in a new way.
Converting our web archives to the new format has provided some new challenges. Web archives aren’t like digitized images – they don’t appear neatly in a record and simply require a click on a “view larger” button to zoom in so researchers can inspect the content more closely. Our web archives try to replicate the look and feel of the site we archived over time, and we’re working hard to ensure that the content works as expected within our new framework and new viewer, particularly.
In addition to working over the coming months on the additional LCWA content, we’ll continue to make iterative fixes and tweaks, and would welcome any feedback that you, as researchers using web archives, digital preservation experts, or just interested folks have.
Let us know what you think!
Web archives are unique in digital library collections. They contain numerous formats, versions and content types. Acquiring, preserving and providing access to web archives often required specialized software, processes and skills. These factors can make it difficult for organizations to embark on web archiving and integrate web archiving into existing library departments and services.
The International Internet Preservation Consortium supported the Bibliothèque nationale de France in developing a workshop titled “How to fit in? Integrating a web archiving program in your organization.” The workshop took place November 26-30, 2012 at the BnF in Paris, France, and the reports and presentations from the workshop were recently published on the IIPC web site.
The aim of holding the workshop was to investigate the challenges and methods involved in implementing web archiving into the mainstream activities of a heritage institution: general institution strategy, acquisition practices, IT operations, preservation, access. It was primarily designed for institutions starting out in web archiving, but also for those who already have a first experience but are looking to explore new organization schemes. Based mainly on the experience at the Bibliothèque nationale de France, speakers presented different aspects of the organization. To provide a contrasting point of view from another national library the British Library was invited to give a presentation, while Internet Memory Foundation and the Institut National de l’Audiovisuel presented other approaches to web archiving. This workshop was of special interest to staff of National Libraries but could be useful to those working in other types of institutions as well.
The report contains lessons learned and an evaluation of the workshop. Also available are the support materials used during the workshop: the slides from the different presentations, an article on the BnF web archiving workflow and a bibliography on digital legal deposit at the BnF.
- Report and evaluation of BnF/IIPC workshop
- Putting it all together: creating a unified web harvesting workflow at the Bibliothèque nationale de France
- BnF digital legal deposit bibliography
- Workshop presentations
The workshop and the resulting reports from the BnF are part of the Education and Training program of the IIPC. The Library of Congress is a member of the IIPC Steering Committee.
The National Digital Stewardship Residency Program has reached a major milestone – the ten residents for the inaugural class have now been chosen! It was a very competitive selection process, and these ten new residents have proven themselves highly qualified to take on current and future challenges of digital stewardship work. They will arrive in Washington, D.C. in September to start their residencies – and we will provide updates here on The Signal as the program progresses this year and next.
See the Library’s press release below for the full list of residents and their assigned projects. Bravo to all!
The Library of Congress, in partnership with the Institute of Museum and Library Services, has selected 10 candidates for the inaugural class of the National Digital Stewardship Residency program. The nine-month program begins in September 2013.
The NDSR program offers recent master’s program graduates in specialized fields— library science, information science, museum studies, archival studies and related technology— the opportunity to gain valuable professional experience in digital preservation. Residents will attend an intensive two-week digital stewardship workshop this fall at the Library of Congress. They will then work on a specialized project at one of 10 host institutions in the Washington, D.C. area, including the Library of Congress. These projects will allow them to acquire hands-on knowledge and skills regarding collection, selection, management, long-term preservation and accessibility of digital assets.
The residents listed below were selected by an expert committee of Library of Congress and Institute of Museum and Library Services staff, with commentary from each host institution.
2013 National Digital Stewardship Residents
(Name; hometown; university; host institution; project description)
Julia Blase; Tucson, Ariz.; University of Denver; National Security Archive; to take a snapshot of all archive activities that involve the capture, preservation and publication of digital assets.
Heidi Dowding; Roseville, Mich.; Wayne State University; Dumbarton Oaks Research Library and Collection; to identify an institutional solution for long-term digital asset management, conduct research on a variety of software systems and draft an institutional policy for the appraisal and selection of content destined for preservation.
Maureen Harlow; Clayville, N.Y.; University of North Carolina, Chapel Hill; National Library of Medicine; to create a collection of web content on a specific theme or topic of interest such as medicine and art or the e-patient movement.
Jaime McCurry; Seaford, N.Y.; Long Island University; Folger Shakespeare Library; to establish local routines and best practices for archiving and preserving the institution’s digital content.
Lee Nilsson; Eastpointe, Mich.; Eastern Washington University; Library of Congress, Office of Strategic Initiatives; to analyze the future risk of obsolescence to digital formats used at the Library and work with Library staff to develop an action plan to prevent the risks.
Margo Padilla; Oakland, Calif.; San Jose State University, Maryland Institute for Technology in the Humanities; to create and share a research report for access models and collection interfaces for born-digital literary materials. She will also submit recommendations for access policies for born-digital collections.
Emily Reynolds; Pleasantville, N.Y.; University of Michigan; The World Bank Group; to facilitate and coordinate the eArchives digitization project, resulting in the creation of a digitized and cataloged historical collection of key archival materials representing more than 60 years of global development work.
Molly Schwartz; Dickerson, Md.; University of Maryland; Association of Research Libraries; to strengthen and expand a new initiative on digital accessibility in research libraries by incorporating a universal design approach to library collections and services.
Erica Titkemeyer; Cary, N.C.; New York University; Smithsonian Institution Archives; to identify the specialized digital and curatorial requirements of time-based media art and establish a benchmark of best practices to ensure that institution’s archives will stand the test of time.
Lauren Work; Rochester, N.Y.; University of Washington; Public Broadcasting Service; to develop and apply evaluation tools, define selection criteria and outline recommended workflows needed to execute a successful analog digitization initiative for the PBS moving image collection.
For more information about the National Digital Stewardship Residency program, including information about how to be a host or partner for next year’s class, visit www.loc.gov/ndsr/.
A few months back, during the Personal Digital Archiving 2013 conference, I was struck by how much interesting research was being done in the field of digital preservation. Everything from digital forensics to gamification, all of it thoughtful, much of it very practical and applicable. Still, I couldn’t help wishing that there was even more going on.
In NDIIPP we often interact with granting organizations and get a peak at the types of things proposers are hoping to get funded. While many useful things are proposed and get funded, I’m struck more by the types of things that I don’t see as often: proposals for practical, applied research that directly address long-time digital stewardship challenges or that build on other stellar research to established a focused advance towards solutions. Many of the issues that need more focus are the types of things that cause organizations to wait on digital stewardship because the problems aren’t solved yet.
So I started writing down a list of things that might merit further attention from researchers and funders. I haven’t done an exhaustive search to see what’s currently being done in these areas (please point things out in the comments!) nor have I thought through all the challenges of doing these types of research (that’s for the researchers!) but I do think these merit further attention.
My inspiration for encouraging applied research is the work NDIIPP did back in 2005 with the Archive Ingest and Handling Test project. The AIHT was designed to test the interfaces specified in the architectural model for NDIIPP. The researchers ended up discovering that “even seemingly simple events such as the transfer of an archive are fraught with low-level problems, problems that are in the main related to differing institutional cultures and expectations” (from its final report (PDF)).
The observations that came out of these discoveries, rather than being irritating sidebars to the “real research,” actually provide ample practical value to future researchers engaged in similar digital preservation activities.
The GeoMAPP project took a similar approach to try and surface unexpected results by having the participants transfer their geospatial data collections back and forth between the different states, exposing each to new approaches and the challenge of “last mile” transfer, storage and network infrastructures.
This is the kind of unexpected knowledge that can come out of applied research, the kinds of efforts that might be applied to some of the areas below:
Format Migration: What happens to any particular file when you migrate the file from one version of software to another? What happens when you migrate from one software type to another, for example, converting files from one type of word processing software to another? What changes happen to the file and the information inside and can these changes be quantified and measured? How can we quantify the changes that happen and determine if they have any import for digital preservation actions? Is it possible to do this all of this at scale and be able to manage the changes in a coherent way?
There is often talk in the digital stewardship community about format obsolescence and the need to address this issue in the future. The need to address format obsolescence has become a truism in the digital stewardship community, and while it may be a vexing problem, there is still doubt about how acute the problem might be. Still, we’ll need answers to the questions above in order to determine whether the need to address format obsolescence through migration is worth the cost of doing so.
Fixity Checking: How often do we need to check the fixity value of any particular digital file to ensure that it remains the same? Is there a risk in touching files too much? Is there an optimal amount of contact that will ensure authenticity while limiting risk and cost? Will regular fixity checking give us more accurate error rates for different types of digital storage? Are there increases in error rates based solely on fixity checking? What are the actual computing costs of checking the fixity of digital files at scale?
Bill Lefurgy described the importance of file fixity in an earlier post as “critical to ensuring that digital files are what they purport to be, principally through using checksum algorithms to verify that the exact digital structure of a file remains unchanged as it comes into and remains in preservation custody.” The NDSA is making efforts to uncover member approaches to file fixity through its regular “storage survey,” while individual members are aware of the value to regularly check the fixity of the digital materials under their purview. The Scape project is looking at this, as is the computer industry. Still, it’s the digital preservation community that is taking the lead in considering these issues, and much more work needs to be done to get some basic data on what happens when we do these types of activities.
Email Archiving: What are the main challenges of email archiving? How can preserved email be made accessible? Is it possible to “weed” irrelevant email messages from those that are archival through automated processes? How can email attachments be preserved along with the messages themselves? How much storage does an average email archive require?
Email archiving is a prime concern for archival institutions, especially those in government. Email archiving solutions are strongly weighted towards the type of email system employed by the organization, and as such, much of the research in the backup and storage of email has been ceded to the information technology industry. It’s uncertain whether the IT approach takes archival concerns into consideration, however, and there remains a shortage of research on email from the archival perspective that might inform IT industry practices. The Collaborative Electronic Records Project focused on the preservation of email, and there has been some research on the archival side into tools that make email archives accessible, such as Muse. Chris Prom’s definitive DPC Technology Watch Report on Preserving Email (PDF) suggests a wide range of potential research paths, but it’s unclear if more practical work has built on his excellent observations.
Thoughts on the above questions? Areas that you think need further research? We’d love to hear your thoughts in the comments.
During Preservation Week 2013, I gave a webinar about personal digital archiving. Over 600 people participated and, during the post-presentation question section, 91 people submitted questions online. I had time to answer about a dozen or so. After the webinar, the hosts from the Association for Library Collections and Technical Services sent me the complete list of questions and I’m gradually responding to all of them. Questions are always good because it helps us to improve and expand our information resources.
The questions covered a variety of topics — email preservation, file naming, digital video, file migration, scanning and digital asset management — but the most striking fact is that two-thirds of all the questions could be grouped into just two main topics: digital photos and storage.
Interest in digital photos is not surprising. Most of the questions we get at NDIIPP personal-digital-archiving presentations are related to digital photos. The webinar questions about storage were also not surprising; with the variety of available digital storage options and the uncertainty about their reliability, storage can be a perplexing topic.
I’d like to share a few of the webinar questions in this post. There’s not enough space to cover both topics today so I will just do the digital photo ones. I will post the digital storage questions in a future column.
Photographer David Riecks, of photometdata.org, helped answer the more difficult questions. Since many of the questions were variations on the same theme, I mashed some of the more representative ones together.
Which is better for preservation, JPEG or TIFF? I have heard that TIFF is better because of degrading. Do JPEGs deteriorate?
TIFF is a lossless format, though newer versions of photo-processing applications such as Photoshop have options to save TIFF files with various forms of lossless compression. A lossless file format is especially good if you plan to return to the file to make tone or color changes, or to retouch the photo. When you finish with the file and close it, there is no data compression and no image data is lost.
TIFF files require more storage space than JPEGs because of their relatively larger data-rich sizes, so some photographic organizations use a form of lossless file compression called LZW. It does take a bit of time to pack the file and each time you open the file it may take a bit of time to expand it. But no data is thrown away and the image does not degrade over time.
If you scan a photo, it is a good practice to save the scan as a TIFF, rather than as a JPEG or PDF, because of the TIFF’s losslessness. In addition, if you want the maximum quality, you can even capture and save up to 16 bits per channel in an RGB TIFF; JPEG only allows for 8 bits per channel.
If you want to share a digital photo that is in a TIFF file format, saving or exporting a copy of it as a JPEG is a fine option. A JPEG can be viewed a web browser and it takes less bandwidth to transmit or download. Always keep the original TIFF though.
If your original digital photo file is a JPEG and you don’t intend to modify it, you can archive it as it is. There is no benefit to converting it to a TIFF if you are not going to modify it. The “lossy” aspect of JPEG becomes an issue when you modify the JPEG and save it — and consequently compress it.
JPEG compression of image data results in some loss of image information, which is why it is referred to as lossy. Compression is not inherently bad; light compression reduces a file size and the lost image information is barely visible. But the more you compress a file, the more information you lose and the worse the photo looks. Once that digital information is lost, you can never get it back.
If you take a TIFF file and save it as a high quality JPEG with a low compression setting, the JPEG may occupy a fraction of the disk space that the TIFF would have occupied. However, if you were to open the JPEG again, make tone or color changes and then re-save it, you would subject it to another round of compression; after multiple rounds of modification and re-compression you would begin to see degradation in the image file.
The amount and quality of compression applied to a JPEG file is an important factor in its quality. In Photoshop, there are two means of creating a JPEG. One uses a quality scale of 1 to 12, with 12 being the least compression or “maximum quality” and it results in the largest file size. Quality equals size. The higher the quality, the larger the file size; the lower the quality, the greater the data loss and the smaller the file size.
The type of JPEG compression applied in a camera will be different from that used in Photoshop. Some of the newer cameras have several settings, ranging from a “Basic” JPEG to a “Superfine” JPEG. These settings probably have a rough equivalent setting to Photoshop but they are not exactly the same.
When modifying digital photos, never modify the original. Always make a copy and modify the copy. You can compress copies for upload or delete copies if you are not happy with the results. Be careful to save the copy with a different name than the original; otherwise it will overwrite and replace the original.
The JPEG 2000 format has both a lossless and a lossy means of compression. Like TIFF, JPEG 2000 can store files with more than 8 bits per channel, though it requires less storage space than a TIFF. Note that while you can substantially reduce a JPEG 2000 file size, there are fewer applications that can create and open this file format compared to a TIFF. If you are considering converting your files to JPEG 2000, do some tests first.
Here’s a tip: if you open a JPEG image in a photo-processing application, modify it and save the retouched image as a TIFF (with or without LZW compression), then this TIFF image will not be any further degraded or compressed than the original. However, if you apply curves or levels to the image, then you will more than likely introduce some loss of data, since both these ways of modifying the tonal distribution of the image do so by squishing or stretching out the original data.
Does adding metadata affect the photo file? If you add descriptive information using particular software, will any other software enable you to view that information or is it all proprietary? Are there any open-source options for adding metadata?
You can modify the metadata about the image — such as caption, description and keywords — with a number of programs. Most of these will only modify the file header information, not the image pixels. [See "An Easy Way to Add Descriptions to Digital Photos," part 1 and part 2.] Adding metadata to a photo file does not subject the image to compression, so the quality of the image will not change. Since the metadata text does take up a little bit space, the size of the image will increase slightly.
Information written to the file header of JPEG images can be read by many applications and, in newer computers, even the operating system itself. For instance in Windows Vista and Windows 7/8, the WIC (Windows Imaging Component) allows you to see this information simply by “right clicking” and viewing the image properties. With Macs, from OS 10.5 forward, the information is visible by using “Preview” and Command + I (view info).
If you add metadata to TIFF files, much is the same as with JPEGs, though not all programs will work. Other special and proprietary file formats like Photoshop files (PSD) and camera RAW files (NEF, CR2) are even more problematic in terms of image metadata and review by other programs.
Most software use the IPTC or XMP standards to store embedded photo metadata. Picasa uses the older IPTC standard. Photoshop uses XMP for storing metadata: this includes the IPTC Core, IPTC Extension, PLUS and more. Information entered with Picasa can be read by Photoshop. The reverse is not always true.
You can find a list of photometadata resources at controlledvocabulary.com.
Does frequently opening digital photos, JPEGs, degrade the quality or is that due to compression?
Moving a JPEG from one location to another will not degrade the image but if the file is corrupted in transit (due to, say, a virus), it will likely not be openable.
It’s important to understand that while compression is used in saving the JPEG file, and the JPEG image has to be decompressed before you can view it, there is no change to the image just through the act of opening the file. Re-compressing the file changes it.
If you “Save” the opened JPEG file, rather than just close the open file (exit without saving), you can cause the file to degrade over time with each “open/save” action. Typically the only time you would be asked to save the file is after modifying the image pixels, such as changing the tone or color, or retouching, cropping or removing red-eye.
You might consider making pixel changes to your JPEG and saving the digital photo as a (lossless) TIFF file.
You mentioned scanning at 300 dpi for the standard photograph sizes. Would you use a different dpi if you were scanning a color photograph versus a black and white photograph?
You could scan a b&w photo using the “grayscale” option rather than the RGB color option, but you’d want at least 300 dpi/ppi regardless.
The following is a guest post by Jefferson Bailey, Strategic Initiatives Manager at Metropolitan New York Library Council, National Digital Stewardship Alliance Innovation Working Group co-chair and a former Fellow in the Library of Congress’s Office of Strategic Initiatives.
A few weeks ago I had the opportunity to teach a personal digital archiving workshop at the Brooklyn Public Library Information Commons, the Central Library branch’s new center consisting of meeting rooms, a training lab, and open workspace with a variety of multimedia computer workstations. Having helped run a number of personal digital archiving events at D.C. area public libraries as part of Preservation Week 2012, I believe in the role that public libraries can play in helping provide guidance to the general public about preserving their digital materials of personal importance. As many public libraries begin to emphasize and extend the role they play in helping citizen and individuals create, document, and preserve digital content, much of it of potential social and historic value, there is evermore chance to advocate for personal digital archiving directly with the local community through workshops and special events.
NDIIPP, of course, provides a wealth of guidance on personal archiving for individuals. NDIIPP also provides guidance to libraries and other institutions planning and running an event or program through the Personal Digital Archiving Day Kit. Ongoing interviews and case studies featured on The Signal, such as posts calling for more citizen archivists and posts highlighting the work of public libraries teaching personal archiving, are just a couple of other ways the program has spread the word about personal digital preservation.
My workshop, “Save Your Digital Stuff,” was largely built on the NDIIPP’s guidance and attendees had question both expected, such as about file naming practices for digital images and scanner and format recommendations for digital conversion, as well as questions unexpected, such as how to preserve one’s online dating profile. Workshop participants were also interested in the overall role that archivists play in preserving digital information and some minor hilarity ensued when the conversation turned to digital wills and one inquisitive attendee wanted clarity on the difference between an archivist and an actuary.
As is often the case, workshop attendees evinced a mild bewilderment at how best to manage and save what often feels like a deluge of digital content but were interested in practical strategies and tools for how best to undertake saving their digital stuff. I emphasized that knowing what you want to preserve is one of the most crucial steps in personal archiving and that, after you have accomplished that, the rest of the steps fall in place naturally.
Overall, it was a fun event and I continue to think that public libraries, many of which provide technology training to the public and many of which also house local history collections, are ideally situated to help proselytize and advise citizens and communities on how best to preserve their valuable digital materials. I look forward to continued collaboration between public libraries and preservationists on supporting digital preservation for the people.
The following is a guest post by Madeline Sheldon, a 2013 Junior Fellow with NDIIPP.
Earlier this week, I visited the Smithsonian Institution to attend a talk by Courtney Johnson, Director of The Dowse Art Museum in New Zealand. Energy oozed from Johnson; she exuded a confidence that was easy-going, without being arrogant. Her background as an art historian, web manager and a scrum master made her a professional force to admire. According to Johnson, a scrum master helps businesses make pertinent decisions about projects or plans as quickly and efficiently as possible. Doing so would allow the business to “fail faster and fail smarter,” eventually becoming more comfortable–and more confident–with any future choices they decide to make.
In her discussion, Johnson frankly admitted that she, as a new museum director, probably “makes mistakes” and “breaks the rules” when making decisions for The Dowse. For example, Johnson revealed that she often seeks out patron input for future displays, and allows their opinion to affect future installations.
To her, any failure or mistake she makes will be a future lesson or adaptation from which to learn. The confidence she places in her decisions keeps day to day business moving forward, never stagnant. Most importantly though, Johnson’s calculated failure keeps her actions transparent, and ensures that she personally advocates for the patrons she serves at The Dowse.
So what does Courtney Johnson’s discussion have to do with digital preservation? Directly, it doesn’t; however, her notion notion of scrum and failure stuck with me, as I continued with my research into digital preservation strategies. After hours of online searching, I recorded a significant jump in the number of published digital preservation strategies/policies from 2008 to the present day. (A digital preservation strategy/policy outlines an institution’s high-level plan for the preservation of digital objects, e.g., born-digital/digitized documents, photos, and/or research data).
As far as I can tell, libraries and archives consistently remain at the forefront for the creation and implementation of digital preservation planning; however, museums (notably) maintain a distant third place. The National Museum Australia’s Digital Preservation and Digitization Policy served as the one example and exception I could find.
Despite this, I did find exciting initiatives surrounding time-based media (e.g., video, animation, audio) from organizations, such as Rhizome, the Guggenheim, and the Tate. Based on these initiatives, it appears that museums are fully invested in the preservation of time-based media, but few have taken the next step towards compiling their experiences into a definite strategy or policy.
During my research, I noted that digital preservation strategies/policies vary considerably from each institution. As every organization must take into account the constraints or abilities of their resources, digital preservation plans do not necessarily follow an exact format. As more and more institutions publish preservation plans for digital content, it will become easier for repositories lacking documentation to build upon their work. With that said, continued collaboration within the museum community will be needed for future innovation.
While decisions regarding the preservation of sensitive work cannot be made lightly, museums might want to adopt the scrum approach to decision-making and become a more vocal presence in the world of digital preservation. Whatever the case, someone will need to take the next step. Who will it be?
My first foray into online communities was in the mid- to late-1980s, when the organization I worked for got some of its online services through UCLA. We got limited access to email and access to the Usenet discussion system. If you’re not familiar with Usenet — which went live in 1980 — surprise! It’s still around. I read threaded discussions on technical topics, but I don’t remember actively participating.
My real introduction to active participation in online communities was CompuServe, which went live in its first incarnation in 1969. I got my CompuServe account in 1988. One dialed in using a modem (I still remember my first 24K baud modem) and signed up for what was a set of topical bulletin boards. I know that I participated in a gardening board, a board dedicated to mystery books, one for science fiction, and I don’t remember what all else. These were active and lively discussions, and private messaging between members. In fact, my first real email address came through CompuServe in the late 80s, when they activated accounts that could send and receive email to any host.
I have a friend who has been a member of The Well for more years than I can remember. That was an acronym for Whole Earth ‘Lectronic Link, and launched in 1985 as a companion service to the Whole Earth Review and the Whole Earth Catalog. Want another surprise? The Well is still alive and running, and a version of the Whole Earth Review is still online for its members.
In the 2000s I was part of a vibrant online community called Readerville, dedicated (mostly) to discussions about books across all genres. I met many people who remain my friends today. I miss it every day.
And, of course, there is a lengthy history of online community bulletin board systems (or BBS) — starting in the late 1970s — that have come and gone over the years. And that mostly, they have gone. These BBS’s often play an important role in researching and documenting computing history, or for cultural historians studying underground culture, or studying the history of computer game development and game play, or even documenting the development of a RL (Real Life) community.
If you want to learn about BBS’s, their history, and the role they have played in various communities, you could not do better than to watch the documentary BBS. The film’s director, Jason Scott, is also the founder of textfiles.com, dedicated to the preservation of BBS content. We interviewed Jason for the Signal, and his ArchiveTeam web archiving effort has just been announced as a 2013 NDSA Innovation Award winner. ArchiveTeam has been a key participant in the preservation of many web communities as well. Separate from these communities serving as source of documentation for research or technology preservation, the participants in BBS’s and online communities often have little opportunity to document their participation and contributions when a service needs to shut down out of economic necessity or corporate decision. These communities have become a highly visible digital preservation target.
As a sidebar, in the late 1990s I needed to gather my personal records related to my seven years on the board of the Museum Computer Network, and much of my early official email was through my CompuServe account. I still had my application floppy for CompuServe Navigator, which I was able to launch on a older Mac, retrieve my account, export my mail as text, and add to my records archive. And I have been migrating that data forward across media ever since.
The following is a guest post by Jefferson Bailey, Strategic Initiatives Manager at Metropolitan New York Library Council, National Digital Stewardship Alliance Innovation Working Group co-chair and a former Fellow in the Library of Congress’s Office of Strategic Initiatives.
The National Digital Stewardship Alliance Innovation Working Group awards team is excited to announce the 2013 winners of the NDSA Innovation Awards. In this, the second year of the NDSA Innovation Awards, four outstanding individuals and projects have been recognized for their contributions to innovation in digital stewardship. Last year’s winners can be seen on a previous post on The Signal.
Selected from a large pool of nominations, this year’s Innovation Award winners represent the creativity, collaboration and willingness to explore novel approaches to complex challenges that define innovation in the preservation and accessibility of digital content. The four winners also represent the diversity of institutions, projects, individuals, and communities working to provide stewardship to digital materials of value.
The awards will be handed out at the upcoming Digital Preservation 2013 conference, July 23-25 in Washington D.C., where the winners will also give brief presentations on their projects. As with last year’s Innovation Award recipients, we hope to feature full interview with each of the winners here on The Signal.
Please join us in congratulating the 2013 Innovation Award winners:
Future Steward: Martin Gengenbach, Kansas Historical Society. Martin is recognized for his work documenting digital forensics tools and workflows, especially his paper, “The Way We Do it Here: Mapping Digital Forensics Workflows in Collecting Institutions” and his work cataloging the DFXML schema.
Individual: Kimberley Schroeder, Wayne State University. Kim is recognized for her work as a mentor to future digital stewards in her role as a lecturer in Digital Preservation at Wayne State University, where she helped establish the first NDSA Student Group, supported the student-lead colloquium on digital preservation, and worked to facilitate collaboration between students in digital stewardship and local cultural heritage organizations.
Project: DataUp, California Digital Library. DataUp is recognized for creating an open-source tool uniquely built to assist individuals aiming to preserve research datasets by guiding them through the digital stewardship workflow process from dataset creation and description to the deposit of their datasets into public repositories.
Organization: Archive Team. The Archive Team , a self-described “loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage,” is recognized for both for its aggressive, vital work in preserving websites and digital content slated for deletion and for its work advocating for the preservation of digital culture within the technology and computing sectors.
Congrats again to this year’s Innovation Award winners! We thank everyone that submitted a nomination and also thank the entire community working to advance digital stewardship.
I started writing before computers were commonly available. But, unlike some who are nostalgic for the era of pen and ink, I feel only joy about relying on machines in my struggle to communicate with written language.
My handwriting was inelegant from the start. I never bothered to ask if neatness counted, because it didn’t matter–my penmanship in elementary school could not, even with abundant time, ever aspire to crisp clarity. Hard as I tried, the results were always disappointing.
Later in high school and college I moved on to typewriters, which offered their own torment, such as the anguish of spotting a misspelled word on a freshly completed page, or worse yet, realizing that a paragraph needed to be restructured or deleted. I noticed early, to my great annoyance, that a typewritten manuscript read differently, and often called for revisions that were not obvious in a handwritten draft. It took some time before I appreciated that this was actually a good thing, as the additional editing tended to make my writing better. But I still wince at the memory of retyping papers over and over to deal with multiple rounds of edits.
When I got my first office job in the late-1970s there were secretaries who would take a handwritten draft and return a typescript. This often meant the secretary appearing at my desk with illegible words circled in the draft. “Really? That’s what that says? I never would have guessed.”
Around 1980 the secretary started using “the Wang,” one of the first popular office word processing systems. It simplified the secretary’s job greatly, as edits now only involved changing words on the screen and reprinting the document. But it didn’t take long before I began to imagine what it would be like to write directly on the machine with no secretarial mediation. This was a little bit radical for the time, as “aspiring professionals” were supposed to avoid “clerical work.” But the lure was strong: writing freed from messy scribbles and the labor of manual retyping; writing that actually encouraged multiple rounds of self-editing.
My dreams were fulfilled in the early 1980s when I was able to use personal computers in a local university computer lab. I always had a quick answer when the lab overseer asked what I wanted to do: “word processing!” This was one fantasy that not only came true, it exceeded my fondest hopes, even when saddled with a clunky DataPoint terminal during a job in the mid- to late-1980s.
I am, of course, far from alone in embracing computer-aided writing, and the change in our homes and our workplaces has been profound. One way to think about the extent of the change is the degree to which it has faded into the background of everyday life.
Today, word processing is increasingly assumed and requires less notice. A quick search of a very large online collection of books dating from about 1970 to 2008 shows the rapid rise and fall of the term “word processing” over that time. We’re using it more and more but marvelling at it less and less.
Except for me. There is nothing like a memory of past suffering to make one feel gratitude for present blessings.
This is a guest post by Camille Salas, the Viewshare.org coordinator for the Library of Congress.
This past December, I shared a lesson plan that uses the Library’s Viewshare platform to create digital libraries. Dr. Erik Mitchell, who teaches the Organization of Information class at the University of Maryland’s iSchool, created the lesson plan. After a pilot test in Spring 2012, he expanded the lesson plan to make the design and implementation of Viewshare a multi-week capstone project in his Fall 2012 course.
The project involved students working in groups to create a digital library with Viewshare and to present their project using a pre-recorded video that served as a showcase for their work. Dr. Mitchell was kind enough to invite me to the last class to hear students’ presentations and their project experiences with using Viewshare. In total, his students created nine views that showcased a variety of themes for specific audiences. Views focused on subjects as diverse as Baltimore art murals, chili recipes by region, cultural tourist attractions, photographs from World War II and Renaissance art works.
Collectively, the targeted users included: art students, business owners, cooks, educators, historic preservationists and tourists. Following the presentations, I had an opportunity to speak with Dr. Mitchell about the presentations and how Viewshare aids in the professional development of future information professionals.
CS: This is the second time you used Viewshare in your Organization of Information class, and I noticed that you revised the original assignment a bit. What kinds of things have you discovered about using the tool as a way to teach cataloging?
EM:The student feedback on Viewshare from the Spring 2012 semester was so positive I decided to not only keep the tool as part of the class but also expand its use to give students a platform and enough time to design and implement a fully-functional information service. Viewshare works well in courses about Information Organization because it helps bring together all of the concepts that we talk about during the semester – from metadata and information design to information use and community outreach. In addition, Viewshare makes it easy to catalog and deploy a digital library but it also gets more powerful as you put better metadata into it. I thought one of the greatest examples of this was seeing students use the timeline and map features to show off some really sophisticated metadata creation work.
I think that a Viewshare assignment is a good substitute for a research paper because it unifies the content of course and encourages some in-depth exploration of these ideas. The students responded well to the opportunity to do group work and I think they liked recording the video to showcase their product. You could really see the production work that went into these videos and, while my first motivation was to use videos because I had in-person and online students, I think it added a nice dimension to the course that would have been missed. I have to say that the students pushed back at first but I think that they ultimately enjoyed it and got something out of the process.
Another outcome that I really did not think about until I saw some of the projects was the fact that Viewshare helped students interested in public service and School Media settings get an idea of how the concepts we studied in the Information Organization course can have an impact across the profession.
CS: As a former student, I noticed that you emphasize learning a diverse set of technological skills throughout the semester, which I did not expect in an Organization of Information course. I’m interested in how this fits into the vision of training future information professionals.
EM: I like to start off the course by exploring HTML because I feel like we all need a refresher and I also think that HTML is a great hybrid technology that includes document structure and context (metadata) information design, and user interfaces. We use that as a jumping off point to learn about XML while we also learn about metadata schemas and eventually make our way around to tools and technologies that help us understand what kind of uses metadata has on the web (like HTTP, OAI/PMH and other metadata rich services). It can be tough to fit all of this in while also talking about the core concepts of classification and document representations but I think we do a fair job of it. This is also a reason why Viewshare fits well with this particular course. Although we spend a lot of time learning the technical nuts and bolts of information organization work in the class, Viewshare brings all of the concepts together without putting too much burden on having to master the technical parts. And yes, while technology can be unexpected in a course like this I think that a gentle introduction to the building blocks of our field yields great benefits.
CS: With respect to the group projects, I noticed that the majority of groups developed their own metadata schema, why do you think it was difficult for the students to strictly adhere to one?
For example, one of the views created is about Renaissance Royalty Portraits. The group used publicly available data and images of 21 royalty portraits to create views including but not limited to: a gallery of portraits, maps of portrait repositories and home countries of subjects, timelines of portrait creation and royalty reign, and a scatter plot of portrait sizes. When it came to selecting a schema for their metadata, the group used the Categories for the Description of Works of Art and other resources from the Getty Research Institute as guidance. Ultimately, they did not strictly adhere to any one schema, which seemed to be a shared experience for all the groups.
EM: In the fall semester we had a long discussion about this. Initially I really wanted students to pick a standard we had looked at so that they had to work with the standard and get to know it in detail. Interestingly, as students branched out to all of these incredible content areas the standards we had talked about like MARC, Dublin Core and EAD and the vocabularies used in them really did not adequately fit the need. Rather than making the whole project center on metadata and fitting everything into pre-defined schemas, we decided to open things up. Students did have to talk about how they evaluated their metadata need and whether or not they were able to reuse any standards but then they were also free to create a new schema that pulled together standards. While there was a tradeoff in that situation, I think the quality of the projects shows that it yielded some great dividends, particularly in helping students to design digital libraries that were directed at specific user communities. As a result, I guess I decided that having students develop their own metadata schema does not really miss the point of a project in an Information Organization class because it helps them make decisions grounded in how the tool will be used.
CS: Do you have any suggestions for improving Viewshare?
EM: I really enjoyed watching students become experts in loading data, working with data formats, augmentation, and applying the data in different views. This involved quite a bit of trial and error as students had to catalog their objects using a spreadsheet, upload the data, publish the digital images and test the system. This led to quite a few comments about making that process easier and, it was not until you came to class and showed us the “refresh” button that we knew that we could have saved a few steps in this process. It was great to see all of the approaches to working with metadata in a group setting. We had students picking different collaborative platforms for creating metadata and image publishing environments and as a result we found out that not all cloud-based document sharing and publishing sites work the same way!
One neat part of the classroom discussion during the presentation of final projects was hearing ideas about other visualization tools and interest in some different approaches to topic or subject display. We had touched on a range of visualization tools including graph-based visualizations so that was mentioned a few times. There were some questions about how to make Viewshare self-sustaining as well. Towards the end of the semester we got into OAI/PMH and so we just had a bit of time to talk about how Viewshare implements that harvesting standard but I think that if the process of data ingest and normalization could be automated it would have been a great way to show to students how these systems are used in production environments.
Many students take the class in their first semester so they have never really worked with raw data. While I think a project like this is a great way to give them a soft introduction to working with and analyzing data, we definitely could have used some workshops, videos or other focused tutorials on working with advanced visualization tools.
CS: Do you have any recommendations for other teachers who are interested in using Viewshare?
EM: I think that the use of Viewshare in my class has been pretty successful with just a bit of planning. Because it is so easy to use I think that it can either be a small or large piece of the curriculum. This means that it also lends itself to a lot of different use cases. The first class to use Viewshare only worked on it for a few weeks and as a result we were still figuring out the tool when the assignment was due. As a result of that experience, I would recommend thinking about weaving Viewshare throughout the course if you are going to use it in in-depth. For example, this past semester we had you visit the class in week 7 to provide an overview and then we touched on different parts of Viewshare throughout the semester to illustrate a specific concept or skill. In fact, I think that we had a small part of the final project assignment due about every two weeks.
I know in talking with you about other uses of Viewshare we discussed humanities classes that were using the tool as well. While Viewshare works really well in Library and Information Science instruction I think it is a great tool for any discipline that wants to spend some time on data and data visualization. In this class we also look at Open Refine and get students acquainted with the idea of data cleanup and augmentation. From my perspective, Viewshare is the perfect companion tool for this tool.
CS: Many thanks to Dr. Mitchell and the students in his class this past fall that did an impressive job. All of the teams successfully demonstrated the versatility with which information professionals could use Viewshare to manipulate data to better serve the information needs of different audiences. I would definitely like to hear how Viewshare is being used in other academic environments. If you would like to share your story with us, please submit comments.
If you will be in the Washington, DC, area next Tuesday, June 11th, please join us for a Viewshare presentation at the Pickford Theater in the Library of Congress James Madison Building at 11:00 am. The presentation will include examples of how users from across the country and within the Library of Congress are using the tool to explore, interpret, and present digital collections. The presentation is free and open to the public.
This is a guest post by Kate Murray, Audio-Visual Specialist with the Office of Strategic Initiatives.
I suppose I’ve always loved puzzles. There are the standard jigsaw and board game varieties – who doesn’t love a good game of Carcassonne? – but I see puzzles in many different environments. I see them in patterns for knitting and sewing, in recipes, in dance routines, even in language. As a Medieval Literature major in college, my favorite course was one in which we did nothing but translate the epic poem Beowulf from Old English to Modern English. I fondly remember sitting in the library with my translation dictionary in hand, poring over the highly structured vocabulary and grammar. There’s just something about figuring out how individual pieces logically fit together to create a larger entity that I find fascinating.
It is perhaps this love of challenging puzzles that draws me to working with digital file formats. They can be incredibly complex and intricate, especially moving image formats that bring together picture and sound data as well as metadata and other elements in such a way that the complete package plays in the correct sequence and at the right speed. Digital file formats are the ultimate puzzles, and I’m thrilled to work on them in my new job at The Library of Congress.
A few weeks ago, I started work as the Information Technology Specialist (Audio-Visual Specialist) in the Office of Strategic Initiatives. My major responsibilities will be supporting the Federal Agencies Digitization Guidelines Initiative Audio-Visual Working Group and contributing to the Sustainability of Digital Formats website, as well as exploring other issues related to digital file formats and digital preservation. I’ve been an active participant in the FADGI group for several years, most notably on the Broadcast Wave Metadata Embedding Guidelines, so I’m excited to expand my involvement. I’m also looking forward to investigating and documenting new format categories for the sustainability website and to whatever else comes my way. This is my dream job, and I plan to make the most of the opportunity.
Before coming to the Library, I worked on digitization and digital format issues for almost five years at the National Archives and Records Administration as a Digitization Process Development Specialist. In a nutshell, my job was to help coordinate digitization in support of NARA’s goals of preserving and making available collections. In that capacity, I looked at processes and workflows related to increased automation and standardization, including format identification and documentation, migration and transformation strategies, tool identification and testing, and quality assurance and control. One of my most significant projects at NARA was the Products and Services web portal, which outlines the technical specifications for all the products made by the Digitization Division.
Before entering government service, I was the Audiovisual Archivist at the University of Maryland Libraries from 2006-2008 and the Audio and Video Collections Conservator at Emory University Libraries from 2003-2006. I started my library career in the 1990s as the Collections Conservator at NYU Libraries after working in the book conservation lab at Columbia University Libraries as a student employee. I’m a native New Yorker, but I completed my Masters in Library Science at the University of Cape Town in South Africa where I also worked in the Manuscripts and Archives Department on image digitization projects.
The following is a guest post by Marie Gallagher, a computer scientist in the Lister Hill National Center for Biomedical Communications at the U.S. National Library of Medicine (NLM).
(This post is based on “Improving Software Sustainability: Lessons Learned from Profiles in Science“, an interactive paper (pdf) at the Society for Imaging Science and Technology’s Archiving 2013 conference, April 2-5, 2013.)
This story begins in the early 1990s at the National Library of Medicine, when our group experimented with arranging, describing and digitizing historical manuscript collections to make the collections searchable and accessible to multiple users simultaneously. Our earliest digital library experiments involved a collection containing correspondence and reports from the 1960s and 1970s. At that time we used a proprietary document management system to collect the metadata, manage the digitized images, and allow for searching across the collection. The proprietary system met our basic needs. However, without access to the source code or support from the vendor, we could neither make changes nor add basic functionality. Over time we replaced components of the system so that we could modify them to meet our evolving needs. Fortunately we were no longer dependent on the proprietary system by the time the vendor was acquired and the product was abandoned.
Today, the metadata and digital images we created using that system survive. We have imported and exported the metadata into different systems over the years. We had scanned the papers to TIFF format files. We benefited from using a sustainable file format because these original TIFF files survive unchanged as our digital masters today. We carefully copy the TIFFs to new media and verify the copies. So the effort to keep the metadata and digital images alive through the years has been minimally burdensome.
Keeping alive the software required for our digital library to function has been a different experience. These software enable metadata creation, quality assurance, digital item management, and access to items and metadata through our current digital library’s Web site, Profiles in Science®. Our digital library’s basic software architecture design has remained fairly stable, as well as our metadata schema and digital items. But the effort to keep alive the various software components we depend upon has been ongoing and seems to have grown over time. We used public domain or open source software where possible, used proprietary software where the benefit outweighed the cost, and wrote our own software when necessary. A few of these software replacements are shown.
The effort needed for ongoing software upgrades and replacements will come as no surprise to software architects, developers, programmers, security experts and others who use or develop software. But the need may be less obvious to anyone more removed from these activities. The ongoing need for upgrades and replacements has sometimes prompted the question, “Why can’t you just build it and leave it alone?”
A look back at the history of some of our software upgrades and replacements provides some answers. Clearly some changes were necessary in order to add new features. But avoiding adding new features still would not have eliminated the need to make replacements and upgrades. Some software replacements and upgrades were necessary because of external threats to the stability of our software. Some of these threats included hardware or operating system incompatibilities, loss of backward compatibility, loss of needed functions, new policy requirements, product abandonment, product support/licensing costs, security flaws and software bugs. Not responding to these threats could have eventually resulted in inability to create or edit metadata and digital items as well as lack of access to our digital items and metadata–not to mention exploitation of security flaws to do harm to our systems or others.
The technological landscape will continue to change. And we will want to be able to make changes and add new features to better manage and provide access to digitized collections. We will want to keep software maintenance costs as low as possible.
Eliminating the threats and effects of technological obsolescence altogether seems unattainable. But we might be able to delay or diminish the threats of technological obsolescence. When we have a choice, we can try to make choices that might encourage the sustainability of our software. Choices might include software that has these characteristics: access to source code, widely-used, well tested, actively developed, uses standards, well documented, acceptable licensing terms, import/export capabilities, multi-platform, supports backward compatibility, and is not overly customized. More suggestions are welcome.
The following is a guest post by Michelle Gallinger, Digital Programs Coordinator with NDIIPP and the National Digital Stewardship Alliance facilitator.
DLib Magazine, the Magazine of Digtial Library Research, has recently published “NDSA Storage Report: Reflections on National Digital Stewardship Alliance Member Approaches to Preservation Storage Technologies.”
The structure and design of digital storage systems is a cornerstone of digital preservation. To better understand ongoing storage practices of organizations committed to digital preservation, the National Digital Stewardship Alliance conducted a survey of member organizations. The article reports on the findings of the 2011 survey, which also inspired a series of blog posts on The Signal. The results of the survey provide a frame of reference for organizations to compare their storage system approaches with NDSA member organizations.
The NDSA Infrastructure working group has reworked the survey to better identify key storage trends and issues as well as to improve clarity of responses. The 2013 Storage Survey was opened on April 25th and the group is currently soliciting responses from the NDSA community. Those interested in the survey questions can feel free to contact me at mgal at loc.gov.
Responses to the survey are interpreted and reported in aggregate. The responses from the 2013 survey will inform a forthcoming report, similar to the above D-Lib article about the 2011 survey. The NDSA Infrastructure working group hopes that this series of surveys will continue to provide critical insight into the storage needs and practices of the preservation community. The working group plans to run the Storage Survey biannually. These regular surveys will provide valuable longitudinal data that will help determine the trajectory of storage needs over time for digital stewardship
Preliminary results from the 2013 Storage Survey will be shared in September 2013.
In this installment of the Content Matters interview series of the National Digital Stewardship Alliance Content Working Group we’re featuring an interview David McClure, a Web Applications Specialist on the R&D team at the Scholars’ Lab at the University of Virginia. David is working on the Omeka + Neatline project and pursuing research projects that explore the idea that software can be used as a tool to inform, extend, and advance traditional lines of inquiry in literary theory and aesthetics.
David was a big part of our Why Digital Maps Can Reboot Cultural History panel at the South By Southwest 2013 conference and brings a unique background to his work with digital maps and cultural heritage.
Butch: Tell us briefly about what the Scholars’ Lab at the University of Virginia Library does and the philosophy behind it.
David: The Scholars’ Lab is a digital humanities center at the University of Virginia Library. The lab was formed in 2006 by combining three longstanding departments into one – an electronic text center, a team of GIS and social science data specialists, and a research computing service from UVa’s central IT division. Now, we’re a two-part organization: We have a public-services team that does classroom teaching, project consulting, and runs a high-end computing lab, and a research and development group that builds digital humanities software behind the scenes. I’m one of the engineers in the R&D outfit.
We build new software like Neatline, assist with faculty projects, and provide general consultation services about digital research methods. There’s also a big focus on maintaining active research profiles on an individual basis – on top of the regularly-scheduled projects, Scholars’ Lab staff devote 20% of their time to completely independent research efforts. These have been extremely productive over the course of the last few years, resulting in major projects like Blacklight and Neatline.
We’re also really committed to the ongoing effort to rethink graduate education in the humanities. We’re currently in the second year of the Praxis Program, a fellowship that brings together a group of six graduate students for a year-long bootcamp that teaches the skills needed to build collaborative digital projects – programing, design, usability testing, communications, project management, etc.
Butch: Tell us briefly about your background and how you ended up at the Scholar’s Lab.
David: I have a sort of zig-zagging academic background. I went to a specialized math and science boarding academy for high school, but then went to Yale and majored in the “Humanities,” an interdisciplinary program that combines literary studies, philosophy, and intellectual history.
I started programming sort of by accident at the end of college, and fell completely in love with it. It was a perfect combination of math and literature – artistic and analytical at the same time. After graduating in 2009, I realized that there was a whole community of people in the digital humanities with the same combination of interests. I shifted into full-time work as an independent developer after about a year, built a couple side projects, and had the good fortune of joining the group here at the Scholars’ Lab in spring of 2011.
Butch: Talk about the Neatline project. How is Neatline different from other mapping projects that work to make historic mapping materials more accessible?
David: In the past, digital maps have often been used as purely analytical tools – a lot of work has focused on creating automatic visualizations of large historical data sets that are too big to reason about without the aid of the computer. Those approaches are incredibly effective, but they’re also a departure from how humanists are used to thinking about concept of place– as something contextual and subjective, a shifting landscape that means different things to different people at different times.
Neatline is interested in offering a qualitative complement to the quantitative methodologies. We’re trying to build a set of tools that make it possible to create really interpretive maps that are capable of representing narrative progression, uncertainty, and change over time.
Butch: How have the technologies of digital mapping changed over the past five years? How have those changes affected the work you do?
David: Actually, I’d argue that a lot of the core technologies we’re using to build web-based mapping applications haven’t changed that much in the last five years – but that they’re going to change a lot in the next five years. A lot of the libraries and components that we use in projects like Neatline are pretty established codebases that first emerged in the mid-2000′s. A lot of those projects are getting towards the end of their life-cycles, which opens up space for new approaches. For example, we’ve been really excited to watch the early stages of development on OpenLayers 3.0, which we’ll integrate into Neatline as soon as it gets to a stable release.
Looking forward, I think a big paradigmatic shift is the move to 3D representations of terrain on the web (like Google Earth, but implemented natively in the browser). There are lots of interesting ideas that open up once you have access to the vertical axis – I’m excited to get my hands dirty with it.
Butch: Where are the current gaps in terms of tools and services to help digital storytellers do their work with maps? What are some tools, approaches or initiatives that might remake the future landscape of digital mapping?
David: Thinking back on the last couple months of work on Neatline, two things come to mind. First, I think that existing tools for drawing visual annotations on maps are less sophisticated than what’s available in other domains. When you look at old hand-drawn maps, there’s often an incredibly intricate level of interpretive illustration that’s layered on top of the basic geography – the map is a canvas, not just a spatial grid. Most digital map-making frameworks, though, expose a pretty simplistic set of annotation tools – points, lines, polygons, etc.
In the current release of Neatline we’ve made it possible to take SVG vector graphics created in programs like Adobe Photoshop and Inkscape – which make it easy to create really complex, smooth geometries – and import them directly into Neatline and drag them out to a specific size and orientation on the digital map. (I wrote about this recently on our blog). This is a big step forward, but we can still do better – I’d like to see really powerful vector editing tools integrated directly into the spatial environment.
Second, I’d argue that spatial “storytelling” is still an unsolved problem in many ways. It’s easy to show where things are on maps, but how do you represent the kinds of progressions and movements that make stories work? We’ve experimented with a lot of user-interface approaches to this problem (numbered labels, timelines, waypoints that point at specific objects or locations). But I’ve always felt that something was lacking.
I think part of the problem is that “stories” often have a kind of original, native existence as texts – stories tend to be spoken or written, and there can be something disconcerting about trying to depart too dramatically from that basic format. We’ve been experimenting with new approaches that try to combine the narrative power of texts and the graphical interactivity of digital maps. We’re currently working on a project called “Neatline Editions” that will make it possible to link individual paragraphs, sentences, and words in a text document with objects and locations on a map.
Butch: In NDIIPP we’ve started to think more about “access” as a driver for the preservation of digital materials. To what extent do preservation considerations come into play with the work that you do? How does the provision of enhanced access support the long-term preservation of digital geospatial information?
David: I tend to think of preservation and access as two sides of the same coin – each validates and reinforces the importance of the other. In fact, Neatline emerged from a positive feedback loop between the two. Back in 2009, the Scholars’ Lab had just finished building a new portal website that made it easier for library users to search for geospatial holdings. Once that foundational work was in place, though, the question became: What do youdo once you find these materials? How do you make sense of them? How do you make arguments about them? Show them to other people? Mix and match them into new combinations?
Neatline was an effort to answer those kinds of questions, to make spatial materials accessible and actionable in new and interesting ways. When people do interesting things with the content, it builds a whole community of users committed to sustained, long-term preservation efforts.
My colleague Leslie Johnston blogged last week about computer hardware preservation and declared a change of opinion on the subject. Her motivation came as a result of discussions at a recent Library of Congress invitational meeting, Preserving.exe: Toward a National Strategy for Preserving Software.
I attended the same meeting and also changed my opinion–but in the opposite direction. (It’s a good meeting when the ideas presented shake things up a bit).
Leslie said she now favors emulation over hardware preservation as the means for providing the computing environment to access application-dependent content. Maintaining the many, many hardware configurations just to run all the different generations of video games, for example, is not practical for most institutions. Once built an emulator is much easier to manage, as it runs on modern commodity equipment. And, since many games and other applications already have successful emulators, there is every reason to expect that approach to work broadly going forward.
All this is undeniably true. I embrace emulation as the most efficient and sensible approach to reanimating older code. Apparently Bruce Sterling, citing Leslie in Wired, does as well.
But let’s hold on a minute. Even if emulation serves the vast majority of access purposes for collecting organizations, there is undeniably more to be learned from older applications than just bringing the code to life. Original hardware allows for a much fuller recreation of the original experience of the application. The browser emulation of K.C. Munchkin! on textfiles.com is excellent, for example, but it not the same as it was orginally played in 1981 on the Magnavox Odyssey game system, with its clunky controllers hooked up to a big console television set in the living room. That difference may be insignificant for most purposes–and it is obviously wonderful to have the emulation–but anyone interested in a richer material context for it or any other obsolete software will surely want to know how it was first integrated into people’s lives.
Yes, I know that libraries and archives should not be computer museums. I was declaring that in public 25 years ago and I still say it today–but with a little less enthusiasm.
Consider the irony if future uses skip libraries and archives in favor of computer museums to research old software.
The presentations I saw at Preserving.exe about what museums and media labs are doing with old hardware–including documenting elements beyond the computer, such as modems, controllers, connectors, custom keyboards, external drives, low-resolution CRTs and placement in homes–convinced me that preserving the code is only part of the story. Nick Montfort, director of the MIT Trope Tank, summed it up best: let’s have both emulation and the original hardware, where possible.
The “where possible” part is tricky, but I now believe that research libraries and archives should consider the selective acquisition of some small holdings of older equipment, perhaps examples of some of the more common platforms, for specialty use. That equipment is still fairly easy to get in workable condition, but that won’t always be the case; my guess is that it will get increasingly collectable, expensive and rare over time. Tacit knowledge about how to set-up and use older equipment is also perishable. Even if only a few institutions undertake some small efforts, there will a base for preserving a fuller range of the essential characteristics of software than from emulation alone.
This is a guest post by Madeline Sheldon, a 2013 Junior Fellow working with NDIIPP.
I am currently working towards a Master of Science in Information from the University of Michigan School of Information, with a specialization in Library and Information Science. In the past, I held library positions, which included working in reference services, managing the collections of two Federal Depository Libraries and conducting archival research, all of which gave me valuable customer service skills and project management experience.
My interest in digital preservation began after completing an internship with a university-sponsored digital initiatives program, where I learned to digitize analog material. At that time, I had little experience or knowledge of proper digitization or preservation practices, but my enthusiasm for my assignment led my supervisor to take me under their wing. I am so grateful to this influential mentor, because their passion for their own work sparked my interest in the emerging field and led me down my current path of study.
To better understand digital preservation, I took courses which I thought would strengthen my understanding and expertise with the preservation of digital images, management of electronic records and web archiving. While taking these courses, I further deepened my passion for the preservation of digital information, and simultaneously became a strong advocate for the careful planning and preparation of preservation policies and contingency plans produced by digital repositories.
I used the knowledge and resources gained from my courses to use in various work and school-related projects. In one instance, I worked with a group of colleges to evaluate and analyze the records management practices of an office that dealt primarily with the creation and storage of electronic records. Because of my research and training, I felt confident that I could offer beneficial recommendations, which would not only assist with the organization’s future preservation efforts, but also add trustworthiness and legitimacy to their record keeping practices.
This summer, while working as an intern for NDIIPP, I will build upon previous Junior Fellow efforts, continuing with an ongoing digital stewardship research project. One of my main assignments will focus on finding new or recently revised preservation policies, strategies, and/or plans from cultural heritage institutions. Once gathered, I will analyze their content and generate a report of my findings for the library.
I will also contribute to this blog, which will provide updates about my progress and highlight noteworthy articles that I find as I conduct my research. I am honored to have been selected by NDIIPP and look forward to collaborating with their staff to provide useful tools and insights that will accelerate the advancements of digital stewardship for future generations.
Clifford Lynch is widely regarded as an oracle in the culture of networked information. Lynch monitors the global information ecosystem for cultural trends and technological developments. He ponders their variables, interdependencies and influencing factors. He confers with colleagues and draws conclusions. Then he reports his observations through lectures, conference presentations and writings. People who know about Lynch pay close attention to what he has to say.
Lynch is a soft-spoken man whose work, for more than thirty years, has had an impact — direct or indirect — on the computer, information and library science communities.
He began his scholarly life studying mathematics at Columbia University. He eventually shifted his focus to computer science and did some academic computing for New York University. In 1980, Ed Brownrigg — a visionary leader in library automation — invited Lynch to California to work on a large, groundbreaking project: constructing the first online union catalog for all of the holdings in the roughly 100 libraries in the University of California system, what was to become the MELVYL system.
Part of the challenge was to merge six to seven million bibliographic records and construct an online catalog. The software and hardware Lynch and his colleagues needed did not exist at that time, so they had to build custom software, a data center and an easily usable interface. The Internet did exist then but it only connected a small number of computer scientists and people doing supercomputing, so to support online access to MELVYL, Lynch and his colleagues duplicated ARPANET technology and deployed it around the state. Lynch also decided to use TCP/IP, which he says was unusual at the time and sparked disputes among some of his colleagues.
“I think we were probably the first major catalog on the Internet,” said Lynch. “And certainly one of the first systems really designed for Internet delivery rather than just handling Internet access as an afterthought.”
Along the way, Lynch became the director of the Division of Library Automation for the UC system and he got his doctorate in computer science at the University of California, Berkeley. His doctoral research was about how relational database systems failed to handle information retrieval applications. Lynch said that he thinks some of the ideas and solutions he worked out may have found their way into some commercial database systems.
In 1997, Lynch left to become the director of the Coalition for Networked Information (an NDIIPP and NDSA partner). Paul Evan Peters founded CNI in 1990 to address network technology in research and education, and to create a dialog between librarians and information technologists about areas of common interest. Lynch said that up until the founding of CNI there had been limited interaction in the academic world between the people who led libraries and the people who led information technology.
CNI did much more than promote a dialogue though. It provided a forum for information. It alerted members of the academic, library and technology communities to key issues, pointed out things that people needed be aware of, helped set policies and create standards (Dublin Core, for example), tracked developments and promoted strategies. Above all it fostered collaboration among the stakeholders, which is as crucial today as it was then.
Lynch said, “In the early days, CNI spent a tremendous amount of time talking with other organizations — notably scholarly societies, government agencies and various cultural memory organizations — about how the Internet and digital content was going to change the work they do and the way their organizations needed to operate and alter their priorities and strategies. It was a consultative evangelism. We are past that now. Most organizations have at least made a first pass at coming to terms with this new world of digital information. Most people in most disciplines now would admit that their work is integrally reliant on digital data and the tools to manipulate it. And that change has taken a generation.”
He said the core conversation between libraries, information sciences and technologists is still vital but the range of participants in that conversation is broader now and the discussion is richer. Publishers, faculty, archivists, instructional technologists, artists, authors and cultural heritage communities are joining the conversation. In many ways, the Internet plays a major role in the conversation.
“The Internet is a very vibrant place today in terms of available content and resources,” said Lynch. “Although it’s also a vastly challenging place still in terms of how to organize and manage and especially preserve that content. We are still in the middle of a major re-calibration as a society about how we deal with our own memory as so much of the material that makes up that memory moves to the Internet.”
Lynch frequently mentioned the cultural record and digital preservation. He was quick to point out that they are not mutually exclusive of each other.
Lynch said, “One of the things I talk about nowadays is trying to understand the shape of the overall cultural record and how that shape is changing and where we are succeeding and where we are failing at coming up — as a society — with preservation strategies for deciding what we need to keep and who’s going to keep it and how it’s going to get kept.”
He said the cultural record and digital preservation help drive scholarship. Special collections set the character of research libraries, and personal materials from important individuals are a key part of these collections. The nature of these personal materials is changing radically and new approaches are needed. Lynch cited the positive examples of CNI’s work with the British Library’s Digital Lives project and CNI’s involvement with the Personal Digital Archiving meetings. He also talked about the urgent need for the general public to get more help with how to manage their personal memories.
A key issue in the acceptance of electronic-only versions of scholarly publications is long-term preservation and access to digital information. Within academia, researchers publish with the expectation that their works will be available well after their lifetime, preserved by academic and research libraries. Lynch said it is important to move digital scholarly works into preservation environments.
To help identify which issues CNI should focus on, Lynch said he and CNI’s associate executive director, Joan Lippincott, consult extensively with his Steering Committee, which includes leaders from CNI’s sponsor organizations, EDUCAUSE and the Association of Research Libraries.
“Ultimately, I set that agenda,” said Lynch. “But the reality is a lot more complex because I spend a tremendous amount of time listening and talking with organizations and people with a stake in that agenda and with good ideas and insights about that agenda. A very important part of what I do is keep an ear to the ground for new things that are emerging, that are candidates for that agenda. There is an art about where CNI is likely to be effective about where to employ limited resources against an almost endless list of potentially interesting and important issues.”
CNI members have a lot of ability to shape its work. Some may bring forth an issue that they want to explore within the CNI community. Sometimes a particular CNI conference presentation may resonate with groups of members and an interest will develop organically.
Lynch deflected questions about his own achievements; regarding projects with which has been involved, he always describes their successes in terms of collaboration and teamwork. I pushed him for an example of where his influence had a direct, beneficial effect and he reluctantly mentioned the preservation of digital scholarly journals as a possible example.
“That was one that nobody wanted to deal with,” said Lynch. “And it was important to speak up and say, ‘you really can’t consider this transition done and we really can’t let go of the print until we have a viable set of strategies for preserving scientific journals.’ Then we saw the establishment of LOCKSS. We saw Portico. And there has since been some serious attention to funding models. Getting institutions — particularly universities — to think about the materials they hold in trust, in order to be able to serve as good digital stewards, about what good stewardship means, is incredibly important. And to the extent that I’ve been one of many people who pushed that discussion along and tried to make it prominent, I think has been a very good thing.”
Lynch is often on the road, nationally and internationally. He lectures, confers, observes, reads and absorbs information. Some topics radiate more importance to him than others. Some may be a continuation of a conversation he heard at a conference somewhere else, which serves to confirm its significance to him. Often he’ll end up drawing attention to whatever new and important information he has gleaned.
Information professionals seek him out not only for what he has to say but also for his skill in saying it…for his ability to explain complex information in simple, direct language.
Lynch is also a catalyst for action. He helps steer the conversation toward real results, such as standards creation, funding, tool development, metadata creation and interoperability. Ultimately, Lynch seems most fervent about collaboration as a crucial force.
“I would be reluctant to attribute much of anything just to my actions,” he said. “Most important successes come through the work of a lot of different people, collaborating and pulling it together. Maybe I can think of a place or two where there was a meeting that I spoke at or convened or I wrote or did something that just happened to fall at a pivotal moment. But any of that to me feels a bit accidental, at best just good luck, being in the right place at the right time.”
Alongside this year’s Digital Preservation 2013 meeting, I am excited to announce that we will also be playing host to a CURATEcamp unconference focused on exploring the idea of exhibition. For those unfamiliar with unconferences, the key idea is that the participants define the agenda and that there are no spectators, everyone who comes should plan on actively participating in and helping to lead discussions. Everybody who participates should come ready to work.
An exhibition involves organizing, contextualizing and displaying collection items. As cultural heritage organizations increasingly make both digitized and born-digital materials available, we find a range of opportunities for exhibiting them. Thinking broadly about the idea of exhibition, everything from faceted browsing and visualizations to linear and non-linear modes of presenting materials, is part of the interpretive framework through which users make sense of collection materials.
This CURATEcamp unconference offers an opportunity for curators, archivists, librarians, scholars, software developers, computer engineers and others to share, demonstrate and refine ideas about exhibition in the digital age.
I am excited to co-facilitate this unconference with Sharon Leon, director of public projects at the Roy Rosenzweig Center for History and New Media, and Michael Edson, director of web and new media strategy at the Smithsonian Institution.
When: July 25, 2013
Where: Alexandria, VA
Register: You can register for the meeting from the Digital Preservation conference registration page. Note that the CURATEcamp is limited to the first 100 registrants.
Potential Session Topics include:
- Open Authority and Curatorial Voice
- Online Exhibition at Scale
- Visualization as Exhibition
- Exhibiting Born Digital Objects
- Interpretation for Mobile Devices
- Digital Storytelling and Cultural Heritage Collections
- Collection Interfaces that Contextualize
- Storytelling and Linked Data
- Social Media as Exhibition
- Citizen Curators
- Blogs as Serialized Exhibits
- Data Journalism as inspiration for Exhibition