Planet DigiPres

Release date: April 18!

Files That Last - 10 April 2013 - 11:15am

I’m going through the proofread copy and making final corrections. You don’t want to know how many embarrassing typos Terri Wells has saved me from. After that, it’s a matter of getting it up on Smashwords and satisfying all their formatting requirements, and on April 18 it will be available for purchase! Everyone who pledged on Kickstarter at the $10 level or higher will get a code to download it for free.

If you’re involved in a Preservation Week event, please think about a way to include a mention of Files that Last.

I’m thrilled, if slightly exhausted, to be bringing this project to a successful conclusion. Thank you all once again for your support!


Categories: Planet DigiPres

April 2013 Library of Congress Digital Preservation Newsletter

The Signal: Digital Preservation - 9 April 2013 - 8:21pm

The April 2013 Library of Congress Digital Preservation Newsletter (PDF) is now available.April 2013 Library of Congress Digital Preservation Newsletter

In this issue:

  • Nominate an individual, student, organization or project for the annual National Digital Stewardship Alliance Innovation Awards
  • A Digital Preservation Fairy Tale: Snow Byte and the Seven Formats
  • Interviews with: Mark Leggott, University Librarian at the University of Prince Edward Island; and Leonardo Flores, Professor of English at El Recinto Universitario de Mayagüez de la Universidad de Puerto Rico
  • Highlights from SXSW 2013
  • Viewshare highlights from Jeremy Myntti, Head of Cataloging and Metadata Services at the University of Utah’s J. Willard Marriott Library, and Violeta Ilik, Metadata Cataloging Librarian at Texas A&M University Libraries
  • Free publication, “Perspectives on Personal Digital Archiving”
  • Personal digital archiving bytes: Death, Taxes, Digital Audits and PUPPIES!; The Personal Pain of Data Loss; When Smart People Make Silly Decisions about their Files; and What Resolution Should I Use? Part 3
  • Upcoming events: Digital Cultural Heritage DC #DCHDC April 18, Washington, DC; CATCH Meeting: Archiving the Web: How to Support Research of Future Heritage? April 19, The Hague, Netherlands; 2013 IIPC General Assembly, April 22-26, Ljubljana, Slovenia; A Michigan Digital Preservation Practitioners Group, April 24, Detroit, MI; Real-World Collection Challenges with Digital Forensics Tools and Methods, June 3-5, Chapel Hill, NC; NDSA Regional Workshop, May 10, Boston, MA; and NDSA Regional Workshop, June 14, New York, NY
Categories: Planet DigiPres

Challenges in the Curation of Time Based Media Art: An Interview with Michael Mansfield

The Signal: Digital Preservation - 9 April 2013 - 2:02pm

Michael Mansfield, associate curator of film and media arts at SAAM.

The following interview is a guest post from Jose (Ricky) Padilla, an intern with the NDIIPP program working on issues related to software preservation and the innovation and infrastructure working groups of the National Digital Stewardship Alliance.

This time in the Insights Interviews series we get the chance to speak with Michael Mansfield, an associate curator of film and media arts at the Smithsonian American Art Museum and representative in Smithsonian’s Time Based and Media Art Conservation Initiative. Mansfield has contributed to exhibitions including The Art of Video Games ,  Watch This: New Directions in the Art of the Moving Image, and Nam June Paik: Global Visionary. I’m excited to get the chance to speak with Michael about his experience and insights on the curation of time based media art.  

Ricky: Can you tell us a bit about your work at the Smithsonian American Art Museum? I would be particularly interested to hear about how your work connects with digital preservation.

Michael: I am the Associate Curator for Film and Media Art overseeing and organizing the permanent collection, acquisitions and exhibition practices for digital, electronic and moving image artworks.  Part of this work includes developing best practices for the preservation of artworks and related archive material comprised of digital and electronic materials.  Digital media is an increasingly important aspect of our cultural heritage, and at the moment, it plays two critical roles in the museum’s initiatives.  First, the institution is authoring its own digital tools to assist in preservation efforts around media artworks, both analogue and computer driven.  And second, contemporary artists are authoring artworks using new and unique digital languages.  These issues present significant challenges, but challenges that we are eager to respond to.

Ricky: Could you give us some examples of some of the pieces you have worked with? What is particularly challenging about working with time based media art? It would be ideal if you could talk us through challenges in working with particular pieces.

For SAAM by Jenny Holzer

Michael: Time based artworks are complex.  A particularly challenging characteristic of time based art is that any single artwork may exist in a multitude of forms.  From a preservation perspective, the art object is both the physical components –which often means an array of components – and the binary signature, which may include multiple assets.  The artist’s relationship to both ‘materials’ is really very important to understanding the artwork’s place.  As a curator of this collection, I need to ensure that the two remain compatible in perpetuity.  One example might be Jenny Holzer’s artwork For SAAM.

It is a 28’ tall, site specific, light column suspended in the museum’s Lincoln Gallery.  The column is comprised of 80,640 LEDs, managed by integrated circuits and attached to several customized circuit boards. Jenny Holzer’s texts – the content displayed on the column – exist as code running in DOS on an old computer laptop.    There are eloquent relationships between her text based work, the code on the machine, and the visualization in the gallery.  Managing the complexities of that on exhibit and in the collection is daunting.

Ricky: What lessons have you learned in working with this material? Further, do you think the lessons you’ve learned in this work transfer more broadly to preserving objects with software components?

Michael: There is a steep learning curve with artworks of this kind.  It is difficult to build an accurate model for handling all time based art, because artworks vary so much from piece to piece.  That’s what makes them unique.  But strategies we develop for handling one artwork certainly give us experience to draw from on the next.  With For SAAM for instance, we’re learning that the code and the components are of equal importance.  It would be unwise to migrate either part to a more stable system without accounting for the other.

Ricky: I imagine that contributing to The Art of Video Games exhibition held last year presented some opportunities to understand some of the nuances of working with time based media art in software. Could you mention other exhibitions or works which allowed you to appreciate the unique challenges of this type of curation?

David Haxton Painting Room Lights, 1980, 16 mm film, color, silent; 9:00 minutes, Smithsonian American Art Museum, Gift of the artist, © 1980, David Haxton

Michael:Yes, The Art of Video Games presented some fantastic learning opportunities around media art in software.  That exhibition in one sense foregrounded the evolutionary changes in hardware, software and interactivity.  And it explored how creativity arrived from within the rules of the material.  On the opposite side of the coin, we are exhibiting artworks in our WatchThis! gallery that experiment with those rules, challenging the very behaviors of technology and sometimes intentionally breaking technology.  This is a strategy some artists employ to uncover new ideas and/or better understand ourselves.  But, caring for artworks that intentionally break the rules, or even visualize destruction, certainly presents its challenges from a preservation perspective.   We have to preserve something so that it can be continuously destroyed.

Ricky: How does the construction of a “curatorial narrative” for an exhibition of time based media art differ from one for traditional art?

Michael: “Curatorial narratives” are a curious thing I suppose, and the material really shapes any exhibition.  I think important elements to consider when developing an exhibition of time based media are simply time and space.  Digital and time based media creates, or can create, a new performance space accessed by actors and unfolding in real time.  Shaping the ‘spatial’ relationships between the artwork and the audience, actor or player can be very informative.  In looking at art: revealing the behaviors of the artist, the behaviors of the media and the behaviors of the participant give us invaluable insight for understanding ourselves and the world we inhabit.

Ricky: I would be curious to know if there are any essays, papers or projects that you have looked to for insight on helping ensure future access to these works. If there are, I would love to hear what you found particularly useful about them.

Barn at North Fork, 2010, Peter Campus high-definition digital video, color, sound; 24:00 minutes © 2010, Peter Campus, 2011.55.1

Michael: Digital preservation among art collections is a very hot topic for museums and institutions at the moment.  Organizations like museums are not known for being particularly nimble, but commercial changes in technology are really forcing the issue.  So now there are incredibly smart people tackling these issues within them.  A number of pan-institutional projects have been formed and they are generating great ideas published for public consumption, (notably something that institutions do very well).  The Smithsonian has its own Time Based Media Art Conservation Initiative that is currently investigating models for trusted digital repositories used for documenting, storing and maintaining moving image artworks.  There is also the Variable Media Network started by the Solomon R. Guggenheim Museum.  And, of course there is Matters in Media Art resulting from a fantastic collaboration between the New Art Trust and it’s partner museums at Tate in the UK, Museum of Modern Art in New York and the San Francisco Museum of Modern Art.  Projects like these are a really compelling and inspiring use of resources.

Ricky: What different groups at the Smithsonian Institution are working on preserving this kind of time based media art? I would be curious to hear a bit about the different players and the different roles that are emerging around this material.

Michael: One thing that is very clear about “time based art” is that it would be impossible for one individual to understand every aspect of the field in all its complexity. The Smithsonian is large and includes a number of independent collections.  While each museum on campus handles their respective collections differently, we’ve come together around these issues and are finding new ways to leverage the tremendous institutional knowledge captured here.  There is a joint, time based art conservation initiative that includes representatives not only from each museum, but each discipline within the museum.  We have curators, registrars, conservators, technology specialists, mathematicians, engineers, archivists … collectively, we can tackle challenges facing our time-based art collections by communicating with truly knowledgeable experts in other fields.

Ricky: What areas do you feel we need more research or tools to support conserving this kind of material?

Michael: I’d like to find interesting ways to document the lifecycle of media artworks.  This might be out of left field a bit, but artworks like this seem to live and breath in ways that are unique in the arts and unique in their time or historical place.  They grow, or shrink.  They respond to their surroundings.  They physically evolve.  They consume.  They age.  They die … In some cases they reproduce. Outside of the box, I think we might benefit from some creative, comparative research with animal sciences, through their documentation of life cycles.  We can look at the tools used by zoos and their conservation practices with living specimens.  How do they document natural behaviors of a living creature?  Perhaps this might generate some new ideas for handling something like an artwork, something that is uniquely human.

 

 

 

 


Categories: Planet DigiPres

Going into final editing

Files That Last - 8 April 2013 - 9:14pm

I’ve got Terri Wells’ edits back in the mail, so now I have to make a final run through the book. After making all the corrections, there will still be work to get it up on Smashwords. My experience with JHOVE Tips for Developers, which I did mostly as a practice run, shows that it will take several revision cycles to get the book’s style to satisfy all of Smashwords’ criteria. (JHOVE Tips still doesn’t qualify for the premium catalog.) Smashwords doesn’t have any provision for submitting a book as a private draft, so please don’t buy it till I say here that it’s ready.

The amount of support that I’ve gotten on this project has been fantastic. I hope you’ll be as happy with the result as I am.


Categories: Planet DigiPres

Reply to DSHR's Comment

Digital Continuity Blog - 8 April 2013 - 3:26pm

I ran over the comment limits on David Rosenthal’s blog when I tried to reply to his reply to my comment on his blog. I’ve included my reply below instead. 

Hi David,

The problem I see is that we fundamentally disagree on the framing of the digital preservation challenge. I meant to reply to your last “refutation” of Jeff Rothenberg’s presentation at Future Perfect 2012 but hadn’t gotten around to it yet. Perhaps now is a good time. I was the one that organised Jeff’s visit and presentation and I talked with him about his views both before and after so I have a pretty good idea of what he was trying to say.  I won’t try to put words into his mouth though and will instead give my (similar) views below.

The digital preservation challenge, as I see it, is to preserve digitally stored or accessed content over time. I think we can both agree that if we aren’t leaving something unchanged then we aren’t preserving anything. So, to me, the digital preservation challenge requires that we ensure that the content is unchanged over time

Now I’m not sure if you would agree that that is what we are trying to do. If you do, then it seems we disagree on what the content is that we are trying to preserve.  If you disagree that that is what we are trying to do then at least we might be able to make some progress on figuring out what the disagreement stems from.

So if you can at least understand my perspective I’d also like to address your comments about format obsolesce. I’m not a proponent of the idea of format obsolescence. The idea makes little sense to me. However I am a proponent of a weak form of the idea of software obsolescence and, more importantly, the associated idea of content loss due to software obsolescence.

The weaker form of the idea of software obsolescence that I’m a proponent of is that because of hardware changes, software loss and loss of understanding about how to use software, software becomes unusable using current technology without active intervention.

The associated idea of content loss that I am a proponent of is the idea that to successfully preserve many types of content you need to preserve software that that content relies upon in order to be presented to users and interacted with. A stronger way of putting that is to say that in many cases, the thing to be preserved is so inextricably connected to the software that the software is part of that thing.

If you take that leap to accepting (whether fully or in order to simplify the explanation) that the software is part of the thing to be preserved, then it becomes obvious that practitioners who are  only doing migration are in many cases not doing real preservation as they are not preserving the entirety of the objects.  Hence Jeff’s presentation in which he reprimanded the community for not really making progress since the early 2000s.  Almost nobody is preserving the software functionality.

As it is relevant to your post and comments, I’ll use a web page as an example to illustrate what I mean. The content presented to users for interaction with by a traditional web page, is presented using a number of digital files including the server hosted files, e.g. the web server & applications, the html/XHTML pages, scripts, images, audio, and the locally hosted files such as the browser, fonts, browser skins, extensions etc. The combination of these files mediated by usually at least two computers (the server and the client) together present content to the user that the user can interact with it. Changing any one of the files involved in this process may change the content presented to the user.  To preserve such a page it is my view that we need to start by deciding what content makes up the page so that we can both begin to preserve it and so that we can also confirm that that content has been preserved and is still there in an unchanged form at a point in the future. In most cases it’s likely that all that needs to be preserved is the basic text and images in the page and their general layout. If this is all then migration techniques may well be appropriate if the browser ever becomes unable to render the text and images (though I agree with you that that doesn’t seem necessary yet or likely to be necessary in a hurry). However there are two difficulties with this scenario:

  1. There will be many cases where the content includes interactive components and/or things that include software dependencies.
  2.  When you don’t know, or can’t affordably identify the content to be preserved, preserving as much as possible, cheaply, is your best option. 

(A)  means  that you will require some solution that involved preserving the software’s functionality, and I believe that (B) means you should use an emulation based technique to preserve the content.

Emulation based techniques are highly scalable (across many pieces of digital content) and so benefit from economies of scale. Emulation strategies and tools, once fully realised, I believe will provide a cheaper option when you factor in the cost of confirming the preservation of the content.

It’s a bit like the global warming problem. Most products and services do not include the carbon cost in them. If they did they would likely be much more expensive. Well I believe digital preservation solutions are similar: if you factor in the costs of confirming/verifying the preservation of the content you are trying to preserve, then many solutions are likely to be prohibitively expensive as they will require manual intervention at the individual object level.  Emulation solutions, on the other hand, can be verified at the environment level and applied across many objects, greatly reducing costs.

So as I see it, it is not about format obsolescence, it is about (a weak form of) software obsolescence and preservation of content that can’t be separated from software.

In your post you seemed to be suggesting something similar, that content needed to be preserved that was heavily reliant upon browsers and server based applications. You also discussed a number of approaches including some that involved creating and maintain virtual machines, and followed that with the statement that: “the most important thing going forward will be to deploy a variety of approaches”. I took that to mean you had softened a little in your attitude towards using emulation to preserve content over time.

Sorry, I seem to have misunderstood.

Categories: Planet DigiPres

Opportunity Knocks: Library of Congress Invites No-cost Digitization Proposals

The Signal: Digital Preservation - 8 April 2013 - 2:53pm

This is a guest post from Vidya Vish, The Library of Congress Contracting Officer for the Third Party Digitization RFP.

Thomas Edison, full-length portrait, seated, facing front, with phonograph, Library of Congress

Thomas Edison, full-length portrait, seated, facing front, with phonograph, Library of Congress

The Library’s collections include tens of millions of items  – not just books, but also manuscripts, monographs, serials, newspapers, pamphlets, sound recordings, films, videos, sheet music, photographs, posters, microfilm and maps. Our collections are at the heart of the Library’s mission to further the progress of knowledge and creativity for the benefit of the American people.

A critical strategy is to make our collections available, not just on-site here at the Library, but through digital copies on the Library’s website. The Library has just released a request for proposals for third party digitization – essentially seeking collaborators interested in digitizing Library collection materials at no cost to the Library. The Library invites proposals from commercial and non-commercial entities in the digital content community, such as e-book publishers or distributors, educational institutions, libraries, archives and others involved in the development of digital collections and dissemination of digital materials.

This formalizes a practice that goes back to microfilm days, where various companies reproduced our collections for their own distribution, and provided microfilm or digital copies back to the Library. Bringing in collaborators allows us to stretch our digitization resources further, making more collections publicly available faster. An open solicitation assures greater transparency and a consistent process that will be fair to all interested third parties.

Read all about it here.

And please, pass the word to anyone you know who might be interested!

Categories: Planet DigiPres

Opening Up the National Digital Newspaper Program

The Signal: Digital Preservation - 5 April 2013 - 6:03pm

The following is a guest post by David Brunton, a Supervisory Information Technology Specialist in the Library of Congress Office of Strategic Initiatives.

The National Endowment for the Humanities and the Library of Congress have partnered to enhance access to historic newspapers for many years with the National Digital Newspaper Program.  A centerpiece of this partnership is the Chronicling America website.  At over six million pages from over thirty states, the program meets this commitment by publishing historic newspapers on the web.

The software that runs this centerpiece is developed in the Library of Congress’s Repository Development Center, and it is called chronam.  It is available for anyone to use: http://github.com/LibraryofCongress/chronam/. From the project README:

“The idea of making chronam available here on Github is to provide a technical option to these awardees, or other interested parties who want to make their own websites of NDNP newspaper content available.”

Around this release, we added a large number of features, and fixed some bugs as well:

  • look and feel can be easily customized
  • database size has been decreased by over 90%
  • search URLs are more cache-friendly
  • word coordinates are saved to the filesystem and delivered compressed
  • much, much more

The customizability is illustrated with the two side-by-side screenshots requiring only a single line change in a configuration file.  On the left is our default for the Library of Congress website, and on the right is a generic view without any Library of Congress branding.

Click to enlarge.

Click to enlarge.

 

 

 

 

 

 

 

 

 

We created a public mailing list, for talking about the software, and we began to publicize our work with the NDNP awardees.  We are now sharing it more widely, in the hopes of furthering the mission to enhance access to historic newspapers.

Categories: Planet DigiPres

Nominations Now Open for the 2013 NDSA Innovation Awards

The Signal: Digital Preservation - 4 April 2013 - 3:30pm

The National Digital Stewardship Alliance Innovation Working Group is proud to open the nominations for the 2013 NDSA Innovation Awards. As a diverse membership group with a shared commitment to digital preservation, the NDSA understands the importance of innovation and risk-taking in developing and supporting a broad range of successful digital preservation activities. These awards are an example of the NDSA’s commitment to encourage and recognize innovation in the digital stewardship community.

The 2012 NDSA innovation award winners

This slate of annual awards highlights and commends creative individuals, projects, organizations, and future stewards demonstrating originality and excellence in their contributions to the field of digital preservation. The program is administered by a committee drawn from members of the NDSA Innovation Working Group.

Last year’s winners are exemplars of the creativity, diversity and collaboration essential to supporting the digital community as it works to preserve and make available digital materials. For more information on the details of last year’s recipients take the time to examine our post on the topic. Be sure to check out our interviews with award winners Mat Kelly the creator of WARCreate, Lisa Gregory of the State Library of North Carolina, Bradley Daigle of the AIMS Project, and Anthony Cocciolo of Pratt Institute School of Information and Library Science

These awards will focus on recognizing excellence in one or more of the following areas:

  • Individuals making a significant, innovative contribution to the digital preservation community.
  • Projects whose goals or outcomes represent an inventive, meaningful addition to the understanding or processes required for successful, sustainable digital preservation stewardship.
  • Organizations taking an innovative approach to providing support and guidance to the digital preservation community.
  • Future stewards especially students, but including educators, trainers, or curricular endeavors taking a creative approach to advancing knowledge of digital preservation issues and practices.

Acknowledging that innovative digital preservation stewardship can take many forms, eligibility for these awards has been left purposely broad. Nominations are open to anyone or anything that falls into the above categories and any entity can be nominated for one of the four awards. Simply put, anyone can nominate anyone and anything. This is your chance to help us highlight and reward novel, risk-taking and inventive approaches to the challenges of digital preservation.

Nominations are now being accepted and you can submit a nomination using this quick, easy online submission form. You can also submit a nomination by emailing a brief description, justification, and the URL and/or contact information of your nominee to ndsa@loc.gov.

Nominations will be accepted until May 16, 2013. The prizes will be plaques presented to the winners at the Digital Preservation 2013 meeting taking place at the Library of Congress in Washington, DC, on July 23-25, 2013. Winners will be asked to deliver a brief presentation about their activities as part of the awards ceremony, and travel funds are expected to be available for these invited presenters.

Help us recognize and reward innovation in digital stewardship. Submit a nomination now!

Categories: Planet DigiPres

New look

Files That Last - 4 April 2013 - 12:32pm

I’ve just revamped the look of the blog to better call attention to the book. Let me know if you think it works or not.

The CSS on the “About” page needs reworking. I’ll get to that soon.


Categories: Planet DigiPres

The Metadata Games Crowdsourcing Toolset for Libraries & Archives: An Interview with Mary Flanagan

The Signal: Digital Preservation - 3 April 2013 - 1:50pm

Mary Flanagan, Sherman Fairchild Distinguished Professorship in Digital Humanities at Dartmouth College

I am excited to continue the NDSA innovation insights interview series to talk about the metadata games open source software project with Mary Flanagan. Mary is an artist, scholar and designer who holds the Sherman Fairchild Distinguished Professorship in Digital Humanities at Dartmouth College and serves as the director of Tiltfactor Lab. While she is broadly involved with ongoing discussions and conversations related to digital art conservation, I am particularly interested in talking to Mary about her National Endowment for the Humanities- funded Metadata Games project.

Trevor: How do you describe the idea of Metadata Games? I’m particularly interested in hearing a bit about who you see as the audience and what you see as the goals for the project.

Mary:There’s no shortage of archival material across the world, as you know. In universities, archives, libraries and museum collections, millions of photographs, audio recordings and films lie waiting to be digitized. The British Library has warned that by 2020 vast quantities of legacy content will be undigitized and is in danger of being forgotten. But digitization is only part of the problem.  Once digitized, someone has to tag the images properly. This takes significant staff time to input. There are many collections that are very well documented and just need to be brought into the digital age. There are, however, millions of artifacts in collections which have little or no informative descriptions aside from what may be written on the archival box or photo itself. Inspired by Luis von Ahn’s research on crowdsourcing, archivist Peter Carini and I thought we should make a free crowdsourcing game toolset for libraries and archives to get some help saving our artifacts from digital oblivion. We imagined a suite of games that can quickly gather valuable tags while offering fun for players. The games are an opportunity for the public to interact with cultural heritage institutions in ways they may not have otherwise.

We have three motivations in terms of audience/players, with overlaps among them. One motivation is to assist a particular institution, or simply contribute to a good cause. These players first and foremost like the idea of helping. A second motivation for players is to play because they love a subject area – they like tagging buildings, or playing games about parts of boats, or dog breeds for example. A third player motivation is simply to win – to be the best, the fastest and most accurate.

Cultural heritage institutions with digital collections that have little or no metadata will likely benefit the most from using Metadata Games. Our hope is that with Metadata Games, cultural heritage institutions will gain useful data for their collections, assist scholars to analyze their collections in novel and possibly unexpected ways and increase engagement with the community at large.

Metadata Games is importantly Free Open Source Software, so no expensive licensing fees or contracts are required and anyone can install, use and customize its functionality. The games themselves are also designed as plugins, so they are FOSS data gathering “portals” that could be adapted to other systems as well. Ultimately we are crafting the Metadata Games kit to be useful for a wide range of institutions.

Trevor: Could you walk us through a few of the specific games? It would be great if you could give readers a few concrete examples of how gameplay happens?

A screenshot of Zen Tag, a naming activity — where participants just name what they see.

Mary: Zen Tag is the simplest possible tagging model: it is a tagging window that offers points. A player is sees a single image, and uses the text box underneath to input as many tags as he or she would like – there is no time limit, and the player works at his or her own pace. Players receive points for each tag submitted, with higher points for tags that prior players have provided. Let me be clear, this one is barely a game at all. We can see how much time a particular player spends and what they input: some players love Zen Tag (or its re-skinned variations): they look at images and treat each as a world of “Where’s Waldo,” trying to describe the image in super-precise detail. Other players see this one and they want to throw their computers through a window. My mother, who I thought would enjoy simply the act of tagging images, really hates Zen Tag. She wants more competitive play. Different players like different activities! For archives, thought, this little point generator can really rake in the tags.  We check the entries with a variety of tools and “verification” game designs, but we have tended to get very accurate tags. There’s a multiplayer and single player of this one.

A screen shot of Guess What, a two-player game where players have to choose an image from an array of images based on clues sent to them by the networked partner.

Guess What! is a synchronous, collaborative two-player game where one player is given a particular image and must describe the image to another player across the network. The other player is given 12 images and must select the correct image based on the description. We also have speedtag, which is a theme-based beat the clock game. CattyGory is an Edward Gory-inspired theme-specific tagging game. We’ve paper-prototyped a whole new set of designs and are honing these for demonstration to our project advisors as we speak. We’re focusing on mobile games, for those are what we play in those in-between moments. It would be awesome if the in-between moments were also improving the digital commons.

Trevor: Based on those examples, could you tell us a bit about what you see players getting out of the experience and what the Library, Archive or Museum gets out of it?

Mary: Players get recognition for their knowledge, they get to have fun while exploring rarely seen artifacts, and they get satisfaction in contributing –and improving– the accessibility and value of an institution’s collection. The project provides a path toward a deeper experience with the collections and the institution.

Alum Tag, an example of using the Zen Tag game to have players identify alumni in photographs

The library/archive/museum receives useful tags and valuable context for their collections, which also improves their accessibility and connections to the public. Through using Metadata Games, libraries and museums can further engage patrons, which can likely improve fundraising and attendance—in particular if they offer real world rewards or events in connection with the online games. For example, we launched a Dartmouth College-related image set under a game reskinned to focus on “AlumTag” during Darmouth’s homecoming weekend when many alumni are back on campus. We had instant participation and have had solid participation since with that collection. Other institutions have been keen to use the software for school fundraisers while also improving their archives.

Trevor: Could you tell us about a few of the different organizations and kinds of collections you have experimented with using the platform with? I would be particularly interested in hearing about the different kinds of orgs and their different use cases, capabilities and needs.

Mary: We started at the Rauner Special Collections Library at Dartmouth College, and in our pilot we used images from the Stefansson Collection on Polar Exploration, one of the world’s most extensive bodies of research materials on the North and South Poles. We then created other installs for our own testing, development and data gathering with a variety of image sets—some general, and some are thematic in nature, like the alumni images. We are about to set up servers for Washington University, Boston Public Library, The University at Buffalo and UC-Santa Cruz right now. The system is also running in Hong Kong successfully! Some institutions want data. Others want more engagement with the public.

We’ve been overwhelmed with interest! Folks at some institutions though have had a difficult time getting the go ahead to try using the system because of institutional politics, conservative managers, or because the server folks are already too taxed. Once we walk folks through how simple the system is to install, that seemed to address the latter concern.

Trevor: What projects have informed and inspired the development of Metadata Games? I would be particularly interested in hearing about particular aspects of other initiatives and projects that have inspired specific features and components of your design?

Mary: I’ve already mentioned Luis von Ahn’s work… The Library of Congress 2008 experiment with using Flickr as a possible crowdsourcing model is also excellent. We were thrilled to learn of the New York Public Library’s “What’s on the Menu?” project. The menu project is proof of two out of our three player motivations: given the right context, people will be very engaged in seemingly niche and esoteric topics, like transcribing and verifying text from old restaurant menus. I hope people will read our 2012 American Archivist article (vol. 75, no. 2) which goes into depth with these examples.

Trevor: I spoke with Arfon Smith of the Zooniverse and Adler Planetarium about their work on Citizen Science projects. I would be curious to hear how you see Metadata Games in relation to projects like the Zooniverse?

Mary: The citizen scientist is also a citizen archivist! The idea is very appealing to us. It requires trusting the public to contribute real data–real knowledge—to our archives and libraries much as they do to science. Games can engage players who initially do not care about the cultural heritage institution whose collections they’re interacting with, but as they play something like Metadata Games, they become more interested in what else the institution has to offer.  Metadata Games is a way of bridging a player’s intrinsic motivations, making connections between what’s intrinsically appealing with civic engagement. Oh, and Zooniverse is awesome. My team is trying to connect this week in fact to see how we can collaborate and share.

Trevor: You guys have been working on this for a bit now, in a few different phases of funding and development. I would love to hear a bit about what you think are some of the big takeaways and lessons learned in terms of

Mary: I’ve talked a little about player motivation. One key lesson learned is in regard to expert tags vs “lowest common denominator” tags. The latter is much easier to design for… It is much more challenging to design games that increase not our base knowledge, but our more expert knowledge—how do we figure out who is an expert? Who do we trust? These are really interesting research questions we’ve encountered while working on the project. Obviously we’re learning from computational linguists, but we’re also learning from Humanists about these issues.  A second lesson learned was getting “too cutting edge” for institutional good. While cultural institutions have similar needs in terms of being able to quickly collect metadata for their collections, they vary very widely in terms of their organizational and technical infrastructure. Finding a balance where our system is flexible and fast, but is still able to run on current systems with current levels of support, has been a key goal. The current build of Metadata Games is built using software that’s available at most web hosting services. We wanted to write the system in a NoSQL database such as MongoDB, but cultural heritage institutions are typically late technological adopters. Almost every institution we spoke with said that they would be sticking with current technologies like PHP and MySQL for at least another 5 years. I was surprised by learning the high number of heritage institutions that don’t host their own servers. We went with a solution that is familiar for now, and can be upgraded later through a plug-in architecture.

Trevor: To what extent do you see Metadata Games as a crowdsourcing or gamification project? I realize that both terms come with a bit of baggage, but both seem to capture some parts of the essence of it. So, would you define Metadata Games as a platform for crowdsourcing metadata collection and remediation? Could one talk about it as “gamified” in the sense that you are bringing game mechanics into the tool? Or do you think there is a better vocabulary we should be using to talk about this kind of project?

Mary: One could refer to Metadata Games using both of those terms, though most game designers don’t like to go near “gamification” because it implies poor game design without meaningful choice applied to corporate interests first, player experience second. We’re not just adding games to archives mindlessly; we’re really trying to address player motivation and foster a connection between the player and the collections, so folks feel a sense of ownership with the archival materials. That’s the big vision for the project: there might be archives just down your street, or in the next town, or in Washington—but for whom? They are saved for us! And for our children, and their children’s children and so on. We have a right to see what’s in there and offer what we know. The public likely knows a thing or two: perhaps someone was married in a particular park, and finds a photo of that place, untagged? An architecture geek can name the architect on this anonymous photograph of a building that otherwise would remain lost and unidentified! A veteran may be able to identify friends in archived news footage! Perhaps your great grandfather could tag plants in a photograph that just looks like a field to someone else. Perhaps your sister can identify a poet’s voice in an audio recording, or can identify the dickens out of Nascar models. Once we know base facts, we can begin to learn from what might be essential lost archives. By sending in their tags, the playing citizen really can contribute new knowledge to the records.

Trevor: Could you tell us a bit about how your team is approaching the open source software development process? Along the same lines, how are you guys thinking about the sustainability of the software you are developing?

Mary: The project is entirely FOSS. To ensure open source compliance, we use openly available frameworks and programming libraries. For the first iteration of Metadata Games, we needed to convert our earlier game prototypes created in Flash to HTML5 and javascript. We also try to use libraries that have an active development community to encourage dialogue and upgrading. One of the great things about an open source project is that you get to see the code that makes it work. Features can be adapted to contexts and intuitions can completely customize the system as they see fit if there is the expertise to do so. Open source is about sharing and interacting.

Ideally, we would like to see a few institutions contribute a custom plug-in or two for the Metadata Games community of users. We are working to make the APIs and documentation as flexible and easy to use as we can. An important part of FOSS work is getting the word out about the project; it is a fantastic contribution to the not-for-profit space, and it raises interesting questions and puzzles. For example, how would you create a trust algorithm? What location based game app might you build off the system? If a project is interesting, people will contribute and build on it. That’s what is meant to happen.

Regarding sustainability, we are working with the Office of Digital Humanities at the NEH on promoting crowdsourced humanities projects. We are kicking off a Humanities-specific code-sharing initiative as well, so humanists don’t have to start from scratch on developing backend databases and the like.

Trevor: How have librarians, archivists, curators and scholars reacted to the idea of metadata games? Are there different camps or perspectives that have emerged from different parties based on their feelings about authority and openness?

Mary: Overall we have been met with extremely positive reception. The fact that the institution can “own” their own data is essential for most of our affiliates who aren’t legally allowed to share some of the collections on the internet due to copyright restrictions and such. Institutions can use the system in-house if desired, or restrict Metadata Games to a particular IP address.  Most of the questions have centered around implementation: how easy is it to install, setup and maintain? How accurate is the data? How can we use the data gathered by Metadata Games back to the collections? Do we WANT to incorporate tags back into the collection, or should we make a parallel identical collection, one for “original” data and one for crowdsourced material that is searchable and constantly updated by the public? Some groups want to integrate all of the data together and some prefer to try a “separatist” approach for at least a trial period. Either way, we’re excited to help.

Trevor: There are a lot of folks interested in inviting public participation through platforms like this into libraries, archives and museums. However, in my experience you are one of the very few working on this sort of thing who has experience as both a game designer and an artist. I would be curious to hear to what extent you think your game design and artistic perspectives come into play in the development of this platform.

We are very careful to attend to the player experience – what is it like as a player to engage with the games? This is as important as finding out how the games generating very good data. I hope that an additional phase of the project will be moving beyond the screen to engage with the space of the museum or library and sinking into some of this material deeply. I’m a closet historian at heart, and I think once people find their way into this content, they may not only contribute to the project but have a richer sense of their communities, their family histories or other cultures. I see the games as much as a means of inquiry and investigation – for us as well as for the players– as they are a play experience in and of themselves. This is one of the ways this project relates to thinking as an artist.  I also think people have the right to access their own cultural heritage, and it may be a right that a lot of us have forgotten all about. Hopefully, we’ll remember soon.

Categories: Planet DigiPres

Re-tailoring FITS

Open Planets Foundation Blogs - 3 April 2013 - 12:36pm

File Information Tool Set (FITS) is the Harvard Library's "Swiss army knife" for file characterization. Created originally for use with the library's Digital Repository System (DRS), it's been made available as open source, and several other institutions have made use of it. The OPF online hackathon last November included some work on it, and recently the Google Code repository (https://code.google.com/p/fits/) which is the official home of Harvard's FITS was cloned to a Github repository (https://github.com/harvard-lts/fits) as a possible step toward more community participation. There was more work on FITS at the March hackathon in Leeds, including initial work on integrating Apache Tika.

I've started work under a SPRUCE grant to continue improvements on FITS and have forked it to another Github repository (https://github.com/gmcgath/fits-mcgath/) for the duration of this work. (The older "openfits" repository which I created in November should now be considered deprecated; the new one is a fresh fork.) Part of this project is to get community input on what will improve FITS and, if time allows, to work it in. Among other things, I'm looking for input into what FITS video metadata should look like. There's already been some discussion of this on my own blog (http://fileformats.wordpress.com/2013/04/01/mfits/). Feel free to try out the changes as they're committed to the repository and to comment on any aspect of the project.

I'm a former software developer for the Harvard Library and currently have some sort of status as an inactive temp employee, but all remarks here are my own and not those of any part of Harvard University.

Preservation Topics: Tools
Categories: Planet DigiPres

Personal Digital Archivists: The Next Member of the Celebrity Entourage

The Signal: Digital Preservation - 2 April 2013 - 1:15pm

The following is a guest post by Tess Webre, intern with NDIIPP at the Library of  Congress.

Upon reading a piece in February’s GQ (yes, I read GQ; sometimes I have to go to the dentist), I came across a piece about a certain celebrity’s extensive archive with nary a mention of the archivist’s name. I thought they deserved some fan press as they are doing something I had not thought possible: being a celebrity’s personal digital archivist.

Awesome, by Sam Howzit, on Flickr

In an era where every celebrity is expected to have social media consultants, yogis, personal assistants, pet groomers and every other kind of service in their entourage, there is a noticeable lack of archivists.  Why is that? Everyone has records, everyone has data that they want to ensure is saved for the long term.

The answer might be in the reaction to the news of this archive. In one instance, it’s described as hoarding, another as vanity.  The perception might be that a celebrity who would spend the kind of time and money to have an ever-present archive and hire an archivist could be perceived as completely self-absorbed, but this is not the case. It is true, that this personal archive will be much more extensive than mine, but I’ve never been on the cover of a magazine. In reality, this is just getting the digital house in order. It is difficult not to view these as aspersions on digital archiving in general and I hope to make some corrections.

It is commendable that this celebrity has taken responsibility for her own digital assets and it should be viewed as an act of empowerment. She wishes to control the destiny of her records and understands the work necessary for this. The employment of her digital archivist proves that maintenance of digital materials is a worthwhile investment professionally and personally. The employment proves that applying standards, such as climate control, has a positive impact on the longevity of the data and should not be limited to the large institutions. The employment proves that if something is worth doing, it’s worth doing well, and personal digital preservation is no exception.

This is not a lone instance of celebrities venturing into the world of digital preservation and taking responsibility for their data. In a recent documentary, Keanu Reeves addresses the future accessibility of digital movies. Salmon Rushdie created a stir when he gave old computers to the Emory Archives. The acceptance and understanding of personal digital preservation is growing, and as such, we should expect more examples of it in the celebrity world.

To the celebrity personal digital archivist, I wanted to thank you for your service to the archives profession in general and digital archives in particular. As a celebrity digital archivist there is a responsibility to prove that this is a good investment.  The more your employers make use of the archives and the more public their support of personal digital archiving, the more likely this will become a lasting trend.  It must be assured that the personal digital archivist becomes the next must-have accessory to show that archivists don’t have to exist in the basement of some large institution, but on the red carpet.

I believe you will succeed in your task of ensuring this.  You are providing an important service as a pioneer and I salute you.

Until next time, I wish you all safe data.

 

Categories: Planet DigiPres

Current status

Files That Last - 1 April 2013 - 7:37pm

I’ve had to change proofreaders at a late date, but I think the new proofreader will do very well. I’m still committed to getting the book out in April.

I’d changed the default page of filesthatlast.com to point at the “About” page. Unfortunately, this left no way to get to the posts page, and every solution to this that I’ve seen requires writing PHP, which isn’t allowed on WordPress-hosted blogs. I really want to attract more attention to the “About” page, which is the one that actually promotes the book, but for the moment I’ve just changed the default page back.


Categories: Planet DigiPres

From Books to Bits: Library of Congress Electronic Literature Showcase Highlights Emerging Literary Forms

The Signal: Digital Preservation - 1 April 2013 - 5:58pm

This is a guest post by Susan Garfinkel, research specialist, Digital Reference Section at the Library of Congress.

Electronic literature—past, future and present—is the focus of a free three-day program at the Library of Congress, April 3 to 5. The Electronic Literature Showcase, sponsored by the Library’s Digital Reference Section, includes a variety of events designed to raise awareness of this rich and growing field of literary expression. The showcase includes an interactive exhibit and open house, a rare book display, digital preservation workshops, literary readings and a keynote address and scholarly panel discussion, all held in the Library of Congress Thomas Jefferson Building.

Amaranth Borsuk showing "Between Page and Screen," by kathiiberens, on Flickr

Amaranth Borsuk showing “Between Page and Screen,” by kathiiberens, on Flickr

Created with computers to take advantage of their unique capabilities, electronic literature builds upon but also extends familiar forms of literary expression by bringing to them new experimentation and interactivity. Poetry, fiction, creative non-fiction, and graphic novels are all transformed by underlying computer code in ways that can’t be replicated by traditional print publication.

Words dance across computer screens while games become poems become puzzles, or readers choose their own path through multi-layered hypertext narrative or use hand-held devices to view works that are location-aware. “Electronic literature,” explains guest curator Dene Grigar, “is a hybrid art form that requires its readers to utilize various sensory modalities, such as sight, sound, touch, movement, when experiencing it.”

Central to the showcase is Electronic Literature and Its Emerging Forms, a three-day interactive exhibit of electronic literature and related printed works that highlight the major strands of influence in the field. Held in the Library’s Whittall Pavilion starting each day at 10 am, the exhibit is curated by scholars Dene Grigar of Washington State University Vancouver and Kathi Inman Berens of Marylhurst University and the University of Southern California, who have previously mounted similar displays at the Modern Language Association’s annual scholarly conference.

This exhibit introduces five major strands of electronic literary expression, pairing each of those strands with their print-format contextual antecedents, drawn from the extensive printed materials in the Library’s collections. Interactive “creation stations” allow visitors to try their hand at some of the basic techniques that have inspired electronic literary authors while a digital preservation display highlights the rapidly changing environment of electronic media and formats in recent decades. On Friday, the exhibit will be extended with items from the Deena Larsen Collection, held at the Maryland Institute for Technology in the Humanities, University of Maryland. Guest curators, Library staff and trained student docents will all be on hand to guide visitors through the interactive experience.

Mark Sample plays beta "A Slow Year" on Atari VCS, by  kathiiberens, on Flickr

Mark Sample plays beta “A Slow Year” on Atari VCS, by kathiiberens, on Flickr

Additional Showcase events include “Electric Hour” readings by featured authors each day at noon and workshops on personal digital archiving. A display of rare artists’ books and early experimental printing will be hosted by curators of the Library’s Rare Book and Special Collections Division in the Lessing J. Rosenwald Room on Thursday, April 2, noon to 3 pm.

On the afternoon of April 5, the Showcase culminates with a keynote address and scholarly panel dicussion. Noted electronic-literary author and scholar Stuart Moulthrop will speak on “Failure to Contain: Electronic Literature, Digital Literacy, and the State (Machine) of Reading.” Following Moulthrop, literary scholars Berens and Grigar are joined by Matthew Kirschenbaum and Nick Montfort to collectively examine the state of electronic literary curation and analysis from a variety of perspectives.

Hours for the exhibit are 10 am – 4 pm on April 3 and 4, and 10 am – 1 pm on April 5; the closing keynote and panel begin at 2:30 on the 5th. A complete schedule and additional information about each event is available here. Visitors to the website can also learn more about electronic literature itself, and will find links to additional resources including a permanent Web site created by Grigar and Berens that documents the electronic literary works on display.

4/1/2013: modified description of the author.

Categories: Planet DigiPres

Software Archiving for EaaS

Open Planets Foundation Blogs - 1 April 2013 - 2:23pm

The typical digital artefact or complex object does not function (render, execute, ...) without a certain software environment. Emulation-as-a-Service (EaaS) provides original environments running in platform emulators. Depending on the (complex) object to be handled, several software components are required to reproduce an original environment. Often, these components are proprietary and require a software license. The software itself and the licenses need to be preserved to enable the reproduction of the original environments. There are a couple of issues linked to software licenses. The issue can change over time definitely influence EaaS as licenses (and software "patents") expire or local and remote license servers become unavailable. Another interesting point, masively disputed by some software vendors, is the development of a second hand software market.

Software Archive of Standard Components

Software components required to reproduce original environments for certain (complex) digital objects can be classified in several ways. There is standard software such as operating systems and off-the-shelf applications sold in (significant) numbers to customers. There might exist different releases and various localized versions (the user interaction part translated to different languages as is the case for Microsoft Windows or Adobe products) but otherwise the copies were exactly the same. Such software should be described uniquely and kept in a software archive of standard components.

There are several ideas on software identification and description already discussed in this blog (e.g. by Andrew Jackson). DOIs would definitely be helpful to tag software like ISBNs, describe books and other media. These tags would be useful for tool registries like TOTEM, too. Optimally, such software archives are managed by the relevant (national) memory institutions. As the archive's content is comparably small and well described by the tags, the workload can easily be shared (federated) among several institutions. Different ways could be envisioned to stock these archives. Legal deposit, as is well established for books and other media, is one option. Or, software components could be collected on-demand upon object ingest. This option is discussed and demonstrated e.g. by the bwFLA project. It provides necessary interfaces to a software archive, so that all required software components can be collected and described. This is done via observed installation processes which records all the required user interaction to install a certain component. Such additional information is to be stored alongside the standard metadata such as license keys. The successful rendering of the object can be directly validated by the user to verify the complete capture of all relevant components.

Unfortunately, a general, coordinated software archiving is still a partially unresolved issue. There are a several activities going on at the National Archives of New Zealand or the National Library of Australia. These activities are very valuable to the whole community as some of the software producers often do not archive their products very long. Additionally, some companies leave the market and not all assets are maintained. There exist initiatives like vetusware.com which try to tackle this problem but operate in a legally problematic domain. They might go down because of take-down or simply because of running out of funding. The drive-by software archiving as run by the Internet Archive might not capture all relevant software as many components were not freely and openly available for download. Especially for older and less popular platforms it becomes more difficult to get hold of obsolete software. Nevertheless, storing and maintaining software components is a prerequisite of the deal. Nevertheless, memory institutions should have special rights to archive software.

Licensing

Every actually running instance of an original environment requires a certain set of licenses depending on the installed or used software. If e.g. a set of presentation slides with embedded audio, video and spreadsheets needs to be rendered, the licenses for the operating system and the presentation software are required. Additionally, audio and video codecs as well as an appropriate spreadsheet renderer needs to be obtained and installed to make the presentation of the object complete. For EaaS a license management component is required to match the number of available licenses to the requested original environments to run. The sources of the licenses could be different and could depend on the user (and institution) requiring access to a certain digital object in its original environment. In a federated EaaS environment run by different institutions, the sharing and handling of licenses becomes an interesting topic, especially if national borders are crossed (e.g. because software vendors try to maintain seperated markets with different pricing).

Within the realm of (national) libraries and archives the licenses of the legal deposit might suffice. For a more open and general service other ways of licensing are required. Either, the software producers offer a specific type of license for that purpose or specifically acquired licenses (e.g. pre-owned license market) are used. Another option is that licenses are obtained (from the original user/producer of the object) when ingesting the particular object. This might be the case for finished (scientific) projects or end-of-life office environments in companies or government organizations. At the moment, licenses are often just thrown away like used IT equipment. For the future a more elaborate digital lifecycle management should be put in place. With the planning and beginning of a project the licensing of all required components should be secured for the complete intended lifecycle of a particular object.

Custom Made Software Components

Not for all software components a (federated) software archive of standard components makes sense. In many domains custom made software and user programming plays a significant role. This could be scripts or applications written by scientists to run their analysis on gathered data, run specific computations or extend existing standard software packages. Other examples are software tools written for governmental offices or companies to produce certain forms or implement and configure business processes. Such software is to be taken care of and stored alongside the preserved object. The same applies for complex setups of standard components with lots of very specific configurations. In these cases it could make sense to preserve the system as a whole (see blog post on that topic for full system preservation).

Pre-Produced and On-Demand Original Environments

EaaS allows to centralize services and share the efforts. This could be especially useful to re-use pre-produced original environments of standard components. Depending on the type of user - if rendering the object within the premises of the memory institution or being from some commercial entity or a private person - different ways of the (re)production of original environments could be chosen:

  • Complete environments together with the required metadata to run it in the chosen virtual machine or emulator. This would be the method to deploy for imaged complete systems.
  • Reproduce the complete environment from standard components using the license information delivered by the user together with the object to render. This may take a while as the setup procedure needs to be completed. The bwFLA project started to implement workflows to gather all the required metadata and user interaction to automatically reproduce such steps.
  • Re-use existing environments from a "cache" (pre-produced environments). This should be possible for in-house use or as an external service if the required type and number of licenses is available. Here a couple of legal concerns might prove problematic as many licenses may not explicitly allow software lending.
  • Partially re-use pre-configured environments if licenses are less problematic and just add the problematic/proprietary component.

Several ways were described to automatically re-produce certain environments e.g. for Windows operating systems (link) or as researched within the bwFLA context. Nevertheless, these procedures take time to complete and extend the time span till an artefact or original environment can be presented to the user.

Preservation Topics: ResourcesCorpora
Categories: Planet DigiPres

First Aid on the Front Lines: Immediate Training Needs

The Signal: Digital Preservation - 1 April 2013 - 2:22pm

The following is a guest post by Jody DeRidder, the Head of Digital Services at the University of Alabama Libraries.

Have you been digitizing and managing digital content?  Are you the go-to person in your organization for accessioning digital materials into your special collections, or collecting electronic records for your archives? More than likely, you’ve not had specific training in how to manage these files long term, and with the growing concern about digital preservation, you’re worried about how to ensure your content is safe and continues to be usable.

Across the country, schools are beginning to offer certificates, residencies and degrees in digital preservation.  That’s great!  Maybe someday your organization can hire one of the graduates.  But what can be done right now?  What about those of you already on the front lines, in the field?  You need training too: low cost or free, easily accessible and targeted to your needs.  How will you administer First Aid to make sure your digital content lasts until those specialists come along?

After the huge success of last year’s Association of Southeastern Research Libraries Intro to Digital Preservation webinars, I sent out a survey in the fall of 2012 to find out which digital preservation topics and types of material are most important to those who want to see more of these free webinars. 182 people responded, and the results were clear.  The top three topics selected were:

  • “Methods of preservation metadata extraction, creation and storage”;
  • “Determining what metadata to capture and store”;
  • “Planning for provision of access over time.”

Beyond these top three results, the responses for ASERL members (37) and non-ASERL members (142) differed somewhat. It seemed that ASERL respondents were less concerned with “nuts and bolts” than the bigger picture; they cared more about developing selection criteria than in tackling file conversions.  While non-ASERL participants were more interested in checksums, file validations, and storage options, those from the research institutions ranked legal issues and audits above these topics.

Similar differences appeared in the comparison of the types of digital content respondents cared about most. Born digital special collections materials were most important to both groups!  However, ASERL respondents considered digitized collections the next most critical, followed by born-digital institutional records and then digital scholarly content, with digital research data only critical to a little over half the ASERL participants. Not surprisingly, non-ASERL participants cared more about born-digital institutional records than digitized collections, and had little concern for digital scholarly content or research data.  Web content ranked lowest for both groups.

This information should help us focus our training offerings and perhaps target specific audiences. In keeping with the survey results, ASERL is again offering a free series of more targeted webinars starting on April 2 (tomorrow!) and continuing through the month, all of which will be archived within a few hours.  I hope to “see” you there!

Categories: Planet DigiPres

A month of FITS

File Formats Blog - 1 April 2013 - 10:32am

For the month of April, I’ll be working full time under a SPRUCE grant on making improvement to FITS, the Harvard Library’s File Information Tool Set. For this purpose I’ve created a fork on Github. This is a fresh fork from Harvard’s FITS repository on Github, so I’ve marked my older fork, OpenFITS, as deprecated. Harvard’s official version of FITS is still the Google Code repository.

My Github repository includes a wiki where I’ll describe progress in detail. The issues area is available for input. I don’t have any plans to address existing bugs in FITS, so please use this just for input on my work, including suggestions.

One area where I really want input is on what FITS should produce for video metadata. There isn’t a lot of consensus yet on what product-independent video metadata should look like. FITS has six different categories of files, each with its own metadata set: text, document, image, audio, video, and unknown. The metadata produced is a composite of the output from the various tools (including Tika, which I’m adding). The point isn’t to use an existing schema, but to put together a list of elements that characterize a video document. “Significant properties,” as people don’t like to say. XMP and MPEG-7 provide ideas, and most if not all audio metadata elements are also applicable to video. I’ve started a wiki page on video metadata within the Github project.

If you have an interest in shaping the output of FITS for video files, please provide input by commenting here, putting an issue on Github, emailing me, or whatever works best for you.


Tagged: FITS, metadata, Open Planets Foundation, software
Categories: Planet DigiPres

Re-Discovering and Linking Metadata in Viewshare: An Interview with Jeremy Myntti

The Signal: Digital Preservation - 29 March 2013 - 1:58pm

This is a guest post from Camille Salas, an intern with the Library of Congress.

Jeremy Myntti is the Head of Cataloging and Metadata Services at the University of Utah’s J. Willard Marriott Library. Jeremy recently gave a presentation at the ALA Midwinter conference entitled Re-Discovering and Linking Metadata in Viewshare. In the presentation, he described how he created a view from the metadata of the Western Soundscape Archive. He found that Viewshare enabled “more possibilities for creating unique experiences for users.”

Jeremy Myntii, University of Utah

CS: Jeremy, please tell us about the Western Soundscape Archive and why Viewshare is a good platform for sharing it?

JM: The Western Soundscape Archive is a unique digital collection that features audio recordings of animals and their environments in the western United States. There are currently over 2,600 items in this collection, including sounds from birds, frogs and toads, reptiles, and mammals, as well as recordings of their environments. When the user interface for this collection was developed, some unique displays were created to help users find the items that they were interested in. Some of the features of our existing interface include some basic hierarchical browsing as well as a couple of different maps to display where the sounds were recorded or habitat maps for different species in five western states.

Viewshare is a good platform for displaying some of the data from this collection since it offers several additional types of views to help our users experience the collection. This includes pie charts, additional map interfaces, tag clouds and lists, timelines, and photo galleries.

List Display of Animals in Western Soundscape Archive View

CS: Please walk us through how you used Viewshare to visualize your Western Soundscape metadata.

JM: In my presentation, there are several slides with screenshots showing the step-by-step process that I used to generate views for theWestern Soundscape in Viewshare. The basic process was to export the metadata from CONTENTdm and then do some basic clean-up in Excel (e.g. removing unneeded fields and items). Since I had saved the metadata as a tab-delimited text file, it was very easy to load the data into Viewshare and begin playing around with it. I knew that I wanted to create a map view of the collection, so I used the “Add” a new field and “Augment” features within Viewshare to create the new field that would have the latitude and longitude information for the map. There were several attempts that I made at this, and I found that Viewshare was able to best augment the data and add latitude/longitude based off of a place name that was a common geographic name. Since the original metadata for this collection already had some latitude and longitude data, I needed to merge the new data that Viewshare created with the existing data. This process was done by exporting the data from Viewshare as a tab-delimited text file, merging the fields in Excel, and then re-loading the data back into Viewshare. That step was the most time consuming step in the whole process, but it still only took a few minutes.

Now that the data was in Viewshare the way that I wanted it, I was able to start creating some “views” or user interfaces. I was able to use the latitude and longitude data that was in the existing metadata as well as the new data added by Viewshare to create a map pinpointing the location that each sound was recorded. I also created some pie charts for a few fields of metadata, such as the common name, order, and class of each animal. Within the views, I created some lists and tag clouds as widgets in the sidebar to be used for some simple faceting. The entire process to load the data and create the different views took about one hour.

CS: Your process reminds me of a recent interview I conducted with Meghan Frazer concerning how she “amplified” collections with Viewshare and other tools. Meghan discovered that “wanting to use one tool to do everything is not realistic,” did you come to a similar conclusion? It’s interesting to hear similar approaches to building collections in Viewshare.

JM: Yes, I agree that doing everything with only one tool isn’t very realistic. In the current user interface that has been developed for theWestern Soundscape, there are multiple tools being used to generate the pages for users to discover the collection. Now with the knowledge of Viewshare, there are more possibilities open to us for creating unique experiences for our users.

CS: One of your final slides describes the lessons you learned about using Viewshare and perhaps visualizing information in general. Share a few of these lessons and what Viewshare users might glean from them.

Re-Discovering and Linking Metadata in Viewshare Presentation

Re-Discovering and Linking Metadata in Viewshare Presentation

JM: After completing this project with Viewshare, there are a couple of major things that I took away. First off is that it is never too late to audit your metadata in order to see what you actually have. By loading and then playing around with a few of the views in Viewshare, I was able to see some of our metadata that is missing or that needs to be cleaned-up because there were a number of inconsistencies such as the capitalization or spelling of different terms. I have also learned that it is not hard to use existing data in new ways.

Playing around with tools like Viewshare can then give you new ideas on ways to present your data which could help our users more readily discover our collections. Using Viewshare was also very easy and user friendly. If simple tools like Viewshare can help us move more towards a linked data environment, then we don’t need to be afraid of what might be coming in the future.

CS: In an Association for Library Collections & Technical Services Metadata blog post, you mention that Viewshare is helpful for thinking about new ways to present data and “help users more readily discover collections.” What kinds of users did you have in mind in creating your view? Did your eventual view prompt you to think about new groups of users? It seems like your view might be really helpful for science educators or those of us who could use a science refresher!

Pie Chart Display of Western Soundscape Archive

Pie Chart Display of Western Soundscape Archive

JM: There are many different types of users for the Western Soundscape Archive. With creating the map interfaces, we are helping users find information about animals that are within their region. The pie charts can help users as well as those working on the collection identify strengths and weaknesses of the collection in order to add more content for those areas that are lacking. These types of views can also be helpful to scientific researchers in identifying specific sounds and by being able to search for a particular biological classification. And like you say, this collection is also used by science educators to help reinforce the biological information of an animal by linking it to a sound and photograph.

CS: What were some of the most intriguing questions and comments by conference participants at the Midwinter session?

JM: A common comment that I received was that Viewshare looks like an interesting tool to expose your data, including any errors or inconsistencies that may reside in the data. Several people mentioned to me that they are interested in using some of their own data in Viewshare so that they can find new ways to display their data, as well as to find projects that they may need to work on or clean up. I also had a comment on how easy and user friendly Viewshare appeared to be, so even people without any programming or interface design experience can easily create their own unique views.

CS: Given your experience with Viewshare, are there any other features you would like to see that would enhance views for people with similar metadata?

JM: A major feature that I would love to see added to Viewshare would be the ability to browse through a collection hierarchically. Several fields of metadata that we have for theWestern Soundscape would benefit from this so that users could browse through the biological classification data (kingdom, phylum, class, order, family, genus).

I would also like to see a feature added to edit the metadata within Viewshare without having to export the data, edit in another tool like Excel or Google Refine, and then re-import the data into Viewshare. Along with this, it would be nice to be able to link the data in Viewshare to the data that we have stored in CONTENTdm so that the Viewshare data would be automatically updated whenever we updated our CONTENTdm metadata.

CS: Thanks, Jeremy, for your time and great suggestions!

Categories: Planet DigiPres

An Intern Considers the Digital Preservation Challenge, Part 2

The Signal: Digital Preservation - 28 March 2013 - 3:19pm

The following is a guest post by Jennifer Clark, an NDIIPP intern from the University of Illinois Graduate School of Library and Information Science.

In yesterday’s post, I discussed how my initial ideas about digital preservation changed during my visit with NDIIPP. Today, I consider what I learned about building a socio-technical cyberinfrastructure.

In an ever-changing digital landscape, with limited budgets and resources, up against a variety of difficult-to-solve challenges, preservationists must learn to work smarter, not harder. The term cyberinfrastructure is often used to describe the type of infrastructure that will be needed in order to share and manage data on a very large scale. The idea of a cyberinfrastructure is one that includes a network of both technology and people, but often the conversation leans more heavily on the raw infrastructure, tools, and standards still needed to create the system. While the raw infrastructure is an essential piece of the puzzle, it shouldn’t be the entire picture.

 thinking the unthinkable, by giulia.forsythe, on Flickr

No digital facelifts: thinking the unthinkable, by giulia.forsythe, on Flickr

Making progress in digital preservation is not just an issue of technology and tools; it’s also an issue of collaboration. This sentiment was echoed time and time again in my interviews with the staff of NDIIPP, and they agreed that while we still need to keep working on the technology problems we face, we must also work to bring communities together. Knowledge sharing is essential in digital preservation because a large portion of the work does not have any precedence, and professionals aren’t always aware of others in the field working through similar problems. The type of cyberinfrastructure we need, therefore, is not just one based on super computers and sophisticated software, it’s also one made of people.

NDIIPP has a number of exciting projects which assist in the building of a socio-technical cyberinfrastructure. From the creation of tools like Viewshare, which allows institutions to visualize their digital collections, to the creation of collaborations like the National Digital Stewardship Alliance, which allows institutions to come together to work through the unique problems encountered in everyday digital preservation activities, NDIIPP has been working to create a network of networks. The Library of Congress has also assisted in creating a common framework for discussion with the help  of tools like the NDSA Levels of Preservation Glossary, and they provide outreach services to individuals in the community for best practices in personal archiving as well as train the trainer programs.

After learning about these programs in person and witnessing some of the work being done, I now believe that the future of the profession is not only to become advocates, but also to become collaborators. We can help people by speaking their own language in order to help them understand the value of preserving their digital items, whether it’s explaining to musicians the importance of keeping lossless digital originals or showing state governments how to work together in order to save money. By convincing people of the future value of their digital objects in a way that is important to them and shifting some of the work to the creation of the digital object, we not only save the objects and help other communities, but we also help ourselves by saving precious processing time and costs.

Becoming collaborators, however, comes with an extra set of responsibilities. We must view these collaborations as merely extending or building on top of existing infrastructures and workflows. All of the communities that need our assistance with digital preservation activities have existing workflows in place, whether the community is comprised of scientists or music label professionals or software engineers. The easiest way to get people to become fellow collaborators is to tap into their familiar workflows and to seamlessly integrate preservation activities, rather than trying to create a parallel workflow or impose an entirely new one.

To borrow the words of NDIIPP’s Acting Director, Leslie Johnston, “None of us should. Ever. Work. Alone. Anymore.” The problems we are facing are too fast and complex, and far too often our institution’s budgets and resources are shrinking at an alarming pace. The only way we are going to be able to keep pushing forward is to build a new coalition of people passionate about digital preservation. Part of the shift will require we take a hard look at our traditional roles.

In my previous post, I mentioned that librarians will have to shake off some of the traditional stereotypes in order to make progress in these new and challenging areas. We will need to be sensitive of our designated communities needs rather than obsessed with our own idealism. We need to help put people at a table together, give them some guidance and best practices, and get out of their way. If we pursue the authoritative librarian role, the only object we’re guaranteed to push into obsolescence is ourselves. NDIIPP’s staff and its projects provide a great model, and we can’t let them do this work alone.

Categories: Planet DigiPres

From the new OPF Chairman

Open Planets Foundation Blogs - 28 March 2013 - 11:56am

As many of you already know, I have taken over the role of Chairman of the Board of the Open Planets Foundation from Adam Farquhar as of February 1, 2012.

Clearly, Adam has already presided over an enormous achievement, first in conceiving and establishing the Open Planets Foundation, and second in bringing the OPF to the point where it is a stable, viable organisation that is both self-sustaining and debt-free. On behalf of the Board, I thank him and applaud his efforts.

Nevertheless, many new challenges lie ahead. First and foremost, we are hoping to achieve a new level of impact and financial sustainability through our new membership model.  The new model consists of three tiers of paid memberships, based on the size of the member organisation. The model also introduced affiliate members, whose contributions are made through in-kind effort. We hope that the tiered model will open the door to a much wider pool of members, which will in turn increase our visibility, impact, and community network. The challenge is to reach out to these organisations and convince them to join the OPF.

In addition, we hope that the affiliate model will help build our portfolio of software assets and increase our USP. The challenge is that this will require additional sustainable technical effort to manage the in-kind contributions effectively.

Finally, I hope that we can shape the organisation so that it can position itself and support its members in the context of Horizon 2020 and the new European funded project landscape.

I must admit that I find these challenges daunting. But I am confident that, with the cooperation of the Board, our Managing Director Bram van der Werf, the OPF support staff, and our membership, we can meet these challenges, secure the future of the organisation, and meet the requirements of our members.

I am looking forward to working with you all!

Ross King

Categories: Planet DigiPres

Pages