I’m Rui Castro. I work at KEEP SOLUTIONS since 2010 where I have the roles of Director of Infrastructures, project manager and researcher. Before joining KEEP SOLUTIONS, I was part of the team who developed RODA, the digital preservation repository used by the Portuguese National Archives.Tell us a bit about your role in SCAPE and what SCAPE work you are involved in right now?
My role in SCAPE is primarily focused on Preservation Action Components and Repository Integration.
In Action Components, I’ve worked in the identification, evaluation and selection of large-scale action tools & services to be adapted to the SCAPE platform. I’ve contributed to the definition of a preservation tool specification with the purpose of creating a standard interface for all preservation tools and a simplified mechanism for packaging and redistributing those tools to the wider community of preservation practitioners. I have also contributed to the definition of a preservation component specification with the purpose of creating standard preservation components that can be automatically searched for, composed into executable preservation plans and deployed on SCAPE-like execution platforms.
Currently my work is focused on repository integrations where I have the task of implementing the SCAPE repository interfaces into RODA, an open-source digital repository supported and maintained by KEEP SOLUTIONS. These interfaces when implemented will enable the repository to use the SCAPE preservation environment to perform preservation planning, watch and large-scale preservation actions.Why is your organisation involved in SCAPE?
KEEP SOLUTIONS is a company that provides advanced services for managing and preserving digital information. One of the vectors that drive us is continuous innovation in the area of digital preservation. In the SCAPE project, KEEP SOLUTIONS is contributing with expertise in digital preservation, especially migration technologies, and with practical knowledge on the development of large-scale digital repository systems. KEEP SOLUTIONS is also acquiring new skills in digital preservation, especially in preservation planning, watch and service parallelisation, we are enhancing digital preservation products and services we currently support, such as RODA, and enhancing relationships with world leader digital preservation researchers and institutions. KEEP SOLUTIONS’ participation in the project will enhance our expertise in digital preservation and that will result in better products and services for our current and future clients.What are the biggest challenges in SCAPE as you see it?
SCAPE is a big project, from the number of people and institutions involved to the number of digital preservation aspects covered. I think the biggest challenge will be the integration of all parts into a single coherent system. From a technical point of view the integration between content repositories, automated planning & watch and the executable platform is a huge challenge.What do you think will be the most valuable outcome of SCAPE?
I see two very interesting aspects emerging from SCAPE.
One is the integration of automated planning & watch into digital preservation repositories. Planning is an essential part of digital preservation and it involves human level activities (like policy and decision making) and machine activities (like evaluation of alternative strategies, characterisation and migration of contents). Being able to make the bridge between these two realms and provide content holders the tools to take informed decisions about what to do with their data is a great achievement.
The other is the definition of a system architecture for large-scale processing, applied to the specific domain of digital preservation, that is able of executing preservation actions like characterisation, migration and quality-assurance over huge amounts of data in a “short” time.
Preservation Topics: SCAPE AttachmentSize rcastro.jpg15.08 KB
The following is a guest post by Lyssette Vazquez-Rodriguez, Program Support Assistant & Valeria Pina, Communications Assistant
This is the second part of a three part series of posts about the 2013-2014 NDSR class, read the first part here.
As part of the nine-month National Digital Stewardship Residency program, the residents recently completed their two – week digital content immersion workshop. Topics discussed included an overview of the digital landscape, how to identify and select potential digital content, and the levels of protection required for digital content, among others.
Mary Molinaro, Associate Dean for Library Technologies at the University of Kentucky Libraries, offered the workshop covering the overview of the digital landscape and the selection and review of digital content. In her workshop, she reviewed the process of identifying and selecting content that needs to be preserved to create an inventory. As part of the workshop, the residents were able to research and compare tools used for a variety of purposes in the digital stewardship and preservation lifecycle.
The workshop “Assess and Describe Digital Collections” was well received by the residents. Carlos Martinez, Information Technology Data Specialist at the Library of Congress, was a topical instructor in this workshop. He explained how it is crucial for digital stewards to be aware of the main characteristics of files and formats that need to be addressed when preservation initiatives are discussed and implemented. During the workshop, the residents were asked to explain how they would approach tackling metadata management issues and primary issues in building and maintaining a digital preservation infrastructure.
When asked to evaluate the workshops, the residents agreed that they were impressed with the engaging approach to digital preservation’s more abstract disciplines such as computer science. Having just finished their graduate degrees, they had the opportunity to refresh theory learned in library school. They agreed on the importance of learning how to use digital preservation tools on test data to complement the theory learned in graduate school.
Now that the immersion workshop is over, the residents will go to their host institutions and start working on their projects. In the next few weeks, we will bring you an update on the progress of their projects. Good luck to the NDSR inaugural class!
I've started to publish some of my notes on digital preservation. It's mostly a collection of 'war stories' and summaries of some of the little experiments I've carried out over the years, but never had time to write up properly. The idea of publishing these stories is inspired in part by XFNSTN, but also by my experience co-coordinating the AQuA workshops and from observing the success of the SPRUCEdp project.
In short, I think we need to share more war stories, not just the occasional full research paper, but also the small stuff, and the failures. Maybe I can start the ball rolling by sharing mine. I'd really like to know if anyone else out there is interested in sharing theirs.
There's a couple of bigger items on there that I think might be of particular interest:
- A long-winded data migration story about accessing data from BBC Master floppy disks.
- A description of how bitwise analysis can be used to better understand formats and the tools that act upon them, somewhat related to an OPF blog post by Jay Gattuso earlier this year.
Feedback welcome, as ever.
A recent NDIIPP intern, Ingrid Jernudd, did some research into current web resources that provide digital access to a broad array of primary source materials at the state level. She prepared a list of sites that billed themselves as general-interest portals to historical resources. Although the list is likely incomplete, I was surprised she found so many.
It is worth bearing in mind also that the list, with one exception, does not include local or municipal websites (the one exception is the Denver Public Library; its Western History and Genealogy site is included because it has resources that extend beyond Denver proper).
Most of the materials available through these websites are digitized versions of analog items. Many of the sites could, however, accommodate born digital content, as well as serve as useful models for the ongoing development of access to cultural heritage resources.
Her findings are presented in the two tables below. Table 1 lists 67 websites that relate to individual states. Table 2 lists four sites that cover multiple states. If you know of resources that are not listed here, please let us know via a comment.
Table 1, State Digital PortalsState Resource Website AL Alabama Dept. of Archives and History Digital Collections http://digital.archives.alabama.gov/ Alabama Virtual Library http://www.avl.lib.al.us/ AK Alaska’s Digital Archives http://vilda.alaska.edu/ AR Arkansa History Commission http://www.ark-ives.com/ Arkansas State Library Digital Collections http://cdm16039.contentdm.oclc.org/cdm/ AZ Arizona Memory Project http://azmemory.azlibrary.gov/ Arizona Cultural Inventory Project http://cip.azlibrary.gov/ CA Calisphere http://www.calisphere.universityofcalifornia.edu/ California State Library Online Resources http://www.library.ca.gov/services/online-resources.html CO Denver Public Library, Western History and Genealogy http://digital.denverlibrary.org/ CT Connecticut Digital Collections http://www.ctstatelibrary.org/dld/pages/connecticut-digital-colle DE Delaware Public Archives http://archives.delaware.gov/ FL Florida Memory http://www.floridamemory.com/ GA Digital Library of Georgia http://dlg.galileo.usg.edu/ Georgia’s Virtual Vault http://cdm.georgiaarchives.org:2011/cdm/ HA Hawaii State Archives Digital Collections http://archives1.dags.hawaii.gov/gsdl/cgi-bin/library IA Iowa State Historical Society Digital Archives http://www.iowahistory.org/libraries/index.html Iowa Heritage Digital Collections http://www.iowaheritage.org/ ID Idaho State Historical Society Digital Collections http://idahohistory.cdmhost.com/ IL Illinois Digital Archives http://www.idaillinois.org/ Explorie Illinois: Illinois Digital Archives http://www.finditillinois.org/ida/ IN Indiana Memory http://www.in.gov/memories/index.html KS Kansas Historical Society State Archives http://www.kshs.org/p/state-archives-library/11933 KY Kentucky Digital Library http://kdl.kyvl.org/ LA LOUISiana Digital Library http://louisdl.louislibraries.org/ LDMA (Louisiana Digital Media Archive) http://ldma.lpb.org/about-ldma MA Digital Commonwealth: Massachusetts Collections Online http://www.digitalcommonwealth.org/ Massachusetts Boards of Library Commissioners Digital Collections http://mblc.state.ma.us/books/digital/ MD Maryland Historical Society Online Collections http://www.mdhs.org/museum/collections-online Archives of Maryland Online http://ow.ly/p8kO4 ME Maine Memory Network http://www.mainememory.net/ MI Seeking Michigan http://seekingmichigan.org/ MN Minnesota Digital Library http://www.mndigital.org/ MO Missouri Digital Heritage http://www.sos.mo.gov/mdh/ MS Mississippi Department of Archives and History: Digital Archives http://mdah.state.ms.us/arrec/digital_archives/ Upper Mississippi Valley Digital Image Gallery http://www.umvphotoarchive.org/ MT Montana Memory Project http://mtmemory.org/ NC North Carolina Digital Collections http://digital.ncdcr.gov/ North Carolina Exploring Cultural Heritage Online http://www.ncecho.org/ North Carolina’s Digital Collections http://digitalnc.org/collections ND State Historical Society of North Dakota Digital Resources http://www.history.nd.gov/archives/digitalresources.html NE Virtual Exhibits of the Nebraska State Historical Society http://nebraskahistory.org/exhibits/index.shtml NJ New Jersey Digital Highway http://www.njdigitalhighway.org/ NM New Mexico State Library Digital Archive http://ow.ly/p8hrp NV Nevada Statewide Digital Initiative http://nsla.nevadaculture.org/ NY New York Heritage Digital Collections http://www.newyorkheritage.org/ New York Department of Records Photo Gallery http://www.nyc.gov/html/records/html/gallery/home.shtml New York Public Library Digital Gallery http://digitalgallery.nypl.org/nypldigital/explore/ OH Ohio Memory http://www.ohiomemory.org/ OK Oklahoma Digital Prairie http://digitalprairie.ok.gov/ OR Oregon Digital Library Project http://odl.library.oregonstate.edu/record/search PA Digital Collections at the State Library of Pennsylvania http://ow.ly/p8hv3 Historical Society of Pennsylvania Digital Library http://digitallibrary.hsp.org/ RI State of Rhode Island Virtual Archives http://sos.ri.gov/virtualarchives/ SC South Carolina Digital Library http://www.scmemory.org/ SD South Dakota State Historical Society Digital Archives http://sddigitalarchives.contentdm.oclc.org/ Digital Library of South Dakota http://dlsd.sdln.net/index.php TN Tennessee State Library and Archives: Digital Collections http://www.tennessee.gov/tsla/resources/index.htm Volunteer Voices http://www.volunteervoices.org/ TX Northeast Texas Digital Collections http://dmc.tamu-commerce.edu/ UT Digital Utah http://pioneer.utah.gov/digital/utah.html VA Virginia Memory http://www.virginiamemory.com/collections/ VT Vermont Folklife Center http://www.vermontfolklifecenter.org/digital-archive/collections/ WI State of Wisconsin Collection http://uwdc.library.wisc.edu/collections/WI WV West Virginia Division of Culture and History Online Exhibits http://www.wvculture.org/museum/exhibitsonline.html WY Wyoming Memory http://www.wyomingmemory.org/
Table 2, Multi-State Digital PortalsStates Resource Website CO, NM, WY Rocky Mountain Online Archive http://rmoa.unm.edu/ MN, ND Digital Horizons: A Plains Media Resource http://digitalhorizonsonline.org/ UT, NV, ID, AZ, HI Mountain West Digital Library http://mwdl.org/ Various Digital Public Library of America http://dp.la
Update: Corrected link to the Vermont Folklife Center.
If you’re in DC this weekend make sure to stop by the 2013 Library of Congress National Book Festival on the National Mall. Authors, poets, illustrators and several Library of Congress programs will be featured over two days, Saturday and Sunday, September 21 – 22, 2013. NDIIPP staff will be in the Library of Congress Pavilion (on Sunday only) with information and handouts about what we call Personal Digital Archiving: tips and guidelines on how people can keep safe their own digital photographs, documents, music, email and other digital information.
NDIIPP has been sharing these ideas at the NBF since 2006, one of the most popular parts of our exhibit are the myriad of old storage discs and out-dated computers we use to represent the constantly changing digital environment. Often, the computer punch cards on display bring back memories of loading reams of the cards into computers that ran fairly simple operations. It’s amazing to think about how far technology has come and how much it has changed our everyday lives. With these dramatic changes we’ve all had to learn new things: how to use a computer, a digital camera, a mobile phone, email, the Internet. We also have to learn how to keep the output of these new technologies and devices so that the generations after us can know what we experienced–our story.
So, if you’re interested in learning more about saving your digital stuff or just want to walk down memory lane with storage discs from your past stop by on Sunday between 10am and 5pm! We’ll be in the Library of Congress Pavilion. At 4:20 p.m. on Sunday Bill LeFurgy will be presenting Preserving America’s Digital Heritage: The National Digital Information Infrastructure and Preservation Program.
The following is a guest post by Carlos Martinez III, a program support assistant in the Library of Congress Office of Strategic Initiatives and a recent graduate of the Catholic University of America’s Library and Information Science Masters Program.
Digital technologies have become an integral part of everyday life, influencing and changing the way information is searched, retrieved, accessed and preserved. Over the past decade, there has been a major shift in the types and formats of information resources people seek, leading to changes in the way new library and information science professionals prepare for the current marketplace. This shift has manifested itself by creating opportunities in information technology roles and positions, such as digital archivists, repository librarians and metadata specialists versus more traditional roles of reference or cataloging librarians. Given this shift, I have often contemplated about the critical skills library and information science professionals need to obtain today – in graduate school and the professional arena.
As a recent graduate and based on my recent work experience at the Library of Congress, I hope to shed some light on what having a “modern” library job entails, offer some thoughts on the types of skills I have concluded are necessary for librarians in today’s information environment, and offer some advice to emerging professionals.
Even with recent discussion on the gaps that exist among professionals in the workplace, “library science” coursework is critical to understanding the profession as a whole. The traditional skills of librarianship, like cataloging and reference services, are still vital to the profession. For example, the information that is accessed via search engines was built on taxonomies and controlled vocabularies; mainstays of librarianship. During my master’s coursework, I took introductory and advanced courses in cataloging and classification. The theoretical concepts and practices taught in these courses are highly applicable to describe information at different levels of granularity for access.
The digital age has also affected acquisition and collection development policies in libraries. Librarians must now consider how to effectively manage digital information resources, while maintaining physical collections. Taking collection management or development courses will prepare one to think critically about how to apply the theoretical principles within your library or information center. In one of my classes we discussed the importance of updating collection development policies to provide access to digital information resources. Acquiring and providing access to electronic materials for patrons both on and offsite is a critical component of maintaining a useful collection.
All that said about theoretical coursework, it’s critical to take courses related to information system design and analysis. In light of previous discussions on this blog, it would be useful for new librarians to understand data as collections and for example, have the ability to manipulate and manage large data sets. A lot of the work that I do at the Library of Congress involves metadata remediation, data management and data migration. The courses that I feel prepared me most for this work were database management and programming for web applications. These courses offered a solid introduction into managing data in a relational database, and writing code for applications to create and access data.
Without a doubt, the work that librarians perform has been changing as access to free digital information becomes more prevalent. It is important to pursue a curriculum that allows you to be comfortable providing user support, reference services, reader advisory, and the myriad of skills associated with librarianship in a variety of formats. The best way to earn this experience is through an internship or a practicum.
Last summer, I completed an internship at the National Archives and Records Administration in the Center for Legislative Archives through Hispanic Association of Colleges and University National Internship Program. During this internship I migrated legacy data from House and Senate records into structured metadata to facilitate access for users. I also shadowed a reference archivist, and assisted him in providing reference support services to patrons visiting the Archives. Before the internship was over, I began answering both virtual and in-person reference requests independently
I started a second HACU internship at the Library of Congress Repository Development Center of the Office of Strategic Initiatives. Being a member of the RDC challenged me to learn about some of the problems librarians are facing with preserving digital information resources, such as digital content transfer and media degradation. I had the opportunity to meet with several stakeholders within the Library, and help develop a set of data elements necessary for the acquisition of electronic journals through a system called e-Deposit .
After my internship with RDC I began working as an employee in OSI’s Integration Management office. My primary responsibilities are in the area of metadata remediation and management. I help develop metadata according to the new web framework guidelines for a variety of online content ranging from digitized materials to static web pages.
Over the past couple of years, Integration Management partnered with the Interpretive Programs Office to migrate online exhibitions into a new web framework. An example of this work can be seen by looking at the Internet Archive’s version of Thomas Jefferson exhibit and the current online exhibit. The new framework allows users to access catalog records associated with digital objects, and creates page-level metadata that will refine the online catalog’s search capabilities.
Combining Theory with Practice
In my personal experience, learning how to provide traditional library services (like reference services and cataloging) is important, but capitalizing on the opportunity to develop a technical background while in library school is equally as critical. The most valuable aspect of completing my graduate coursework was learning the principles of the profession, and becoming instilled with its values because the core mission of librarianship has not changed.
As an emerging information professional, the most important theoretical principles are centered on becoming familiar with authority control in the digital age, the ability to manage and manipulate large sets of data, and understanding the challenges of preserving physical and digital formats. New librarians need to possess the ability to assess and describe collections like traditional librarians, but they also need to know enough about technology to successfully curate digital collections in the information age. While it is important to have this knowledge when entering the profession, it is equally as important to have had practical experience applying it. The experience you earn will not only prepare you for the workplace, but will give you an edge for applying the theory to practice.
The John W. Kluge Center at the Library of Congress has announced a new set of Kluge Fellowship in Digital Studies to examine the impact of the digital revolution on society, culture and international relations using the Library’s collections and resources. I am thrilled to have the chance to talk with Jason Steinhauer, Program Specialist with the Kluge Center about how this unique opportunity could fit with ongoing scholarship and research in digital stewardship.
Trevor: Could you give us a quick overview of the fellowship? What are the key points for anyone interested in it?
Jason: Sure. This is a call to scholars and thinkers worldwide to examine the digital revolution’s impact on how we think, how we live and how we relate to one another. Digital technology has made its way into every facet of our lives. Although it may be too early to fully know what the impact of the digital revolution is, it’s not too soon to ask the question. We hope to catalyze thinkers and scholars to take a step back, take a broad look at the evidence of the digital revolution’s effects on our lives and look deeply to see if something has fundamentally shifted. If so, what is it? What does it mean for us? What are the implications, positively or negatively? We hope to bring great minds to the world’s greatest repository of knowledge to investigate these questions.
Trevor: A reoccurring theme on The Signal as been bringing data science and computational analysis to bear on cultural heritage collections. For example, work funded through the interagency digging into data grants program often falls into this area. Would this call be an opportunity for data scientists and computer engineering researchers to develop that kind of corpus analysis research on things like the more than 30 million online documents in the National Digital Library mentioned in the call? I would be curious to hear you explain a few of the kinds of things you might imagine scholars could propose in this vein. Further, could you give us a sense of what would make this kind of proposal strong and compelling to reviewers.
Jason: Well, it’s a wide topic and applicants can approach the subject any number of ways. Most important, though, is to ensure that proposals address questions of deep concern to the humanities and the social sciences. We’d encourage scholars to go beyond data science and computational analysis and think about the digital revolution’s impact on language, education, communication, our thought patterns, on our values. Is the digital revolution ushering in a fundamental change in how we communicate, for example?
Some scholars speculate that the language of computer programmers may become the lingua franca of the future. Is that one of the implications of the digital revolution? Are there other implications for language, as more and more exchanges between people and nations are conducted through digital means? Is the digital revolution fundamentally shifting our values? If so, how? We want scholars who are willing to think deeply and critically about the implications of this massive transformation using the Library of Congress collections, as well as additional resources in Washington.
Trevor: Reading the call I thought of two very different streams of digital scholarship that might fit into it. On the one hand, there is work in new media studies that focuses on close readings and analysis of digital materials and their histories. On the other there is work in the digital humanities that focuses on computational analysis of digitized collections of existing primary sources from earlier eras. Matthew Kirshenbaum talked about these different streams of research in a recent interview. Are both of these kinds of research projects in play for the fellowships? If so, could you provide a sense of how these very different kinds of proposals would be evaluated against each other?
Jason: It’s best to think of this as a humanities fellowship that critically explores the digital revolution’s impact on our lives. Not to say that digital scholarship is not interrelated to this, but a deeply-rooted humanities framework may be most helpful in crafting a proposal.
In terms of evaluation, all applications to the Kluge Center are evaluated against five criteria: the significance of the contribution that the project will make to knowledge in the specific field and to the humanities or social sciences generally; the quality of the applicant’s work; the quality of the conception, definition, organization and description of the project; the likelihood that the applicant will complete the project; and the appropriateness of the research for the Library of Congress collections.
We hope to offer up to three fellowships in the first year of this competition and the three selectees may take very different approaches. We’re hoping to see a lot of differing, creative approaches to the topic.
Trevor: The call specifically mentions the Twitter archive. Do you have a sense of the kinds of modes of access proposing scholars would have with the twitter corpus?
Jason: The Twitter archive is a new kind of collection for the Library of Congress. Archiving and preserving outlets such as Twitter will enable future researchers access to a fuller picture of today’s cultural norms, dialogue, trends and events in order to inform scholarship, the legislative process, new works of authorship, education, and other purposes.
The Library has received billions of tweets and corresponding metadata to date, and is now working to develop a stable and sustainable way to preserve and organize the collection. In the near term, the Library is working to develop basic levels of access for on-site researchers and scholars-in-residence. We anticipate this Kluge Fellowship in Digital Studies to be an ongoing program, so we felt it appropriate to mention the Twitter archive as a potential resource, even though the full functionality may not be in place by the time the first fellows arrive. Scholars should not base their proposals around the Twitter archive, but rather consider it as one of the resources to mine while here at the Library of Congress.
Trevor: Aside from its born-digital and digitized collections, the Library of Congress has extensive holdings of personal papers and other primary sources that would seem to offer considerable value to answering questions about the impact of the digital revolution. Off the top of my head, something like John Von Neumann’s papers comes to mind. To help spark potential researchers’ imaginations, do you have any thoughts on particular Library of Congress collections that might be ripe for this call?
Jason: This is a great point. The Library of Congress has 35 million books, millions of manuscripts, moving images, sound recordings, digital collections, journals, newspapers, oral histories, the general humanities collections, the Law Library collections, the records of the U.S. Copyright Office, the holdings of the Science, Business and Technology Division, the writings of 20th and 21st century writers and thinkers… depending on the research question proposed, any number of these collections could be appropriate. The sky is really the limit.
Trevor: The Library of Congress has a sizable collection of video games at the Packard Campus A/V Conservation facility. Would proposals that focused on studying this collection of video games and related materials be relevant to this fellowship? Assuming they are, what would make a proposal to study these materials need to be able to establish to be compelling?
Jason: That’s a great idea. The effect of video games and online simulation on the cultural and societal norms that shape our lives has been seismic. The video game collection could certainly support a proposal; the proposal should indicate how these collections would inform the larger research question.
Trevor: Are there any key final words or thoughts that you want to stress about the program?
Jason: This is a unique moment for the Kluge Center and the Library of Congress. We have an opportunity to step back and ask important questions about ourselves and how we relate to one another in this new digital world. The insights from the scholars and practitioners we bring to Washington will open numerous possibilities for programs, symposia, seminars and more, to explore with policymakers and the public what the digital revolution means to us and future generations.
This fellowship is just the start. We hope people across the world will join us—including The Signal—and those interested should subscribe to our RSS feed on our home page, as well as check out the Digital Studies Fellowship page on our website. Thanks for letting us share this announcement with your readers, and we hope that some of them will apply!
Here's a little newsbulletin about FIDO, the open source file format identification tool of OPF.
It seems that the use of FIDO is growing the last few months. I am getting responses by e-mail and through the Github issuetracker from all over the world, ranging from requests for help, giving suggestions for improvement and even some bugfixes. Thanks and please keep them coming!
Most important change currently is the versioning schema of tagged releases.
If you forked FIDO or watching the tags for updates, please notice that the versioning schema has changed from [major].[minor].[patch] to [major].[minor].[patch]-[PRONOM version number].
The reason for this is that from time to time there is a new PRONOM version available but there are no code changes to commit. As it is bad practice to update a tagged release this was the only reasonable way to fix this.
For example, release 1.3.1 has PRONOM version 70 distributed with it and is tagged '1.3.1-70'.
If a PRONOM update is available but there are no code changes the consecutive tag will be '1.3.1-71'. Please note that this is only reflected in release tags, FIDO will still only report its version number without the PRONOM version number.
Currently I am also working on the FIDO usage guide. It is still a work in progress, but it could help you on your way using FIDO.
I'll be the first to admit that FIDO is still far from being "the perfect file format identification tool". Although it is quite stable and many things are improved or fixed lately such as the handling of files passed to STDIN or the possibility to use only the official PRONOM signatures, it still needs improvement on many levels.
Recently Carl Wilson (OPF technical lead) and I started to work on thinking what needs changing for FIDO version 2. This second generation of FIDO will not differ much in functionality of the current version 1 generation but the way we plan on doing things will make a big difference. For starters we will be creating unit tests for every function of FIDO. Second important thing are unit testing of individual PRONOM signatures and PRONOM container signatures. With each update of PRONOM we will run unit tests using corpora files.
But the biggest change of all will be the way we build FIDO. It will no longer be just "a script", but rather an API. The "fido.py" script will then merely function as a prototype how to build your "own" FIDO into your workflow systems. It will also no longer output to STDOUT and STDERR but will return results in a more Pythonic way. You will read more about all this in a later post.
In the mean while I (with a little help of you) will continue on improving version 1 where possible. If you have any questions or suggestions about any of the above, please let me know.
I’m sure these types of classes will become standard at every library at some point. – Jon Eriksen
Public libraries are becoming the front lines in the spread of digital literacy. This is evident in the calls for action contained in the Institute of Museum and Library Science’s “Building Digital Communities” guide and in the increasing volume of topics about “digitization” and “digital libraries” at ALA conferences.
“Digital literacy” in the context of public libraries refers mainly to empowering people to use computers and the Internet in order to improve their quality of life and job prospects.
But another aspect to digital literacy that is slowly gaining ground is understanding how easy it is to lose access to digital possessions and how that loss can be prevented. Some inspired public library staff are teaching personal digital archiving to their communities. These librarians realize that personal digital archiving is of crucial value to everyone in their community and they are stepping up and doing something about it.
Jon Eriksen is one of those inspired volunteers. Eriksen is a technology and reference librarian at the The New Canaan Public Library and since the fall of 2012 he has been organizing personal digital archiving events in public libraries in and around New Canaan, Connecticut. I contacted Eriksen and asked him about his outreach work.
What motivated you to organize these presentations?
When I was working on the research for my master’s thesis, which used a personal information management perspective to look at artists’ uses of their personal collections, I got interested in personal digital archives and collections.
When I would talk to people about my thesis I ended up having all of these great conversations about keeping digital materials organized, safe and accessible. It became clear to me that anyone with a PC or a digital camera struggles with these issues to some degree.
I think I was interested in a particular duality – the data that we keep is personal, both in the sense that what we save reflects what we care about, and also because how we organize and care for our materials says a good deal about our preferences and priorities.
At about that time, I first came across the Library of Congress Digital Preservation website and read about personal digital archiving events. I thought that these events seemed like a great way to highlight the issues I was interested in and since I’d already spoken with so many people I felt certain that there would be an interest in the community – and I thought it could be a very fun class to teach. We had a great response in our community to the first couple of classes we offered and that motivated me to offer it to other libraries as a part of our outreach program.
I really felt like my library supported the effort and I think that was crucial to our success. For the first event, we designed and printed postcards and flyers, and we really put effort into marketing it on our website and newsletter. That initial effort paid off by drumming up interest and from the start our events were well attended.
You originally gave a two-hour presentation but cut your subsequent presentations to about an hour. Why?
When I did the presentation for the first time I covered a lot of ground but felt like it was too long and in-depth for one sitting. I thought it would make more sense to view the personal digital archiving class as a foundation for other courses and we could offer separate in-depth technical classes on scanning, e-mail archiving and other topics.
I think that this structure works better since it allows, in the first course, more focus on the mapping and planning piece of creating an archive, using the four-step strategy promoted by the Library of Congress – identify, select, organize, and save. I think that the four-step strategy has been really useful, both because it gives a good overview of a project and because it breaks down the whole endeavor into separate and manageable tasks, and that makes it easy to talk about the hurdles you might meet at each step.
When I don’t try to cover both theory and technique I can also leave a lot of room for questions during the presentation and that can be important when the audience has varied archival materials and varying degrees of digital literacy.
What Library of Congress resources have you found most helpful to you and the general public?
When I was preparing for my first presentation, I consulted all of the Library’s material in the personal archiving section. I found that the step-by-step guides, in particular, provided a good framework for the class and made it easier for me to organize all of my materials and talking points. Now, I also list the personal archiving page as the recommended resource for patrons when they need more information or get stuck, and I still use the digital formats website when I feel like I want more specific information on formats.
Having such a unified resource is great and my next step here at the New Canaan Library will be to create a dedicated personal archiving web resource for our community, both to highlight these issues and also to draw connections between local resources, equipment and programming. I hope that this will make it easier to connect one-to-one with our patrons, and to provide a more individualized resource for people who are just starting out.
You said that some topics you need to explain by demonstration and some topics you can explain with a handout. Can you give me examples of both?
At the end of each presentation, I demonstrate how to create a folder structure because I think it’s helpful to have a visual demonstration – what a well-planned and accessible archive will actually look like. I then scan a photo and import a few more photos, just to show how to rename and sort objects into the folder structure. Finally, I show a couple of examples of archive instructions and inventories.
By this point I have talked people through the four-step strategy, and I think that the demonstration at the end of the class ties everything together and helps people feel that it will be easy to get started. People leave the course with a couple of printouts – a simple step-by-step guide and a list of recommended formats for archiving. I get a lot of questions about things like recommended resolution for scanned images, and the step-by-step guide makes it easier to get started at home.
Why do you think it is important for people to create an inventory at the top level of the folder? Do you think they will actually do it? What information should be in it?
One of my key insights from browsing through other people’s computers while interviewing them while I was working on my thesis was that the way we organize our things to a certain extent reflect the way we think about the world. This helps us to navigate our own material, but can pose a problem when someone else will try to make sense of that archive.
People only rarely consider that someone else might look at their personal materials, but when you speak with them most people don’t want their photos, records and correspondence to be unavailable to their children, to future generations. I think it’s really important to be clear and systematic when you create a personal digital archive. Keeping an easy-to-find guide or inventory in a top-level folder makes your materials so much more accessible for someone else.
To further future-proof the archive, I recommend that people include system and software information and that people keep a log where they record decisions about naming conventions and preferred formats so that those decisions don’t have to be made every time they add material to the archive. And if people include scanned material, I think it’s a good idea to include a note about where the originals are stored.
I know that most people won’t do all of these things – I don’t always do them myself. But to include at least a rudimentary description of the archive is such good practice that I have to talk about it. I find that people aren’t generally thinking about someone else going through their digital materials at all, so talking about these issues and writing out a description allows people to actually imagine someone else trying to navigate their archive.
Tell me about the New Canaan Library’s VHS dubbing stations.
The Library recently acquired a VHS-to-DVD recorder for the digitization of our VHS collection. When I mentioned this during a presentation, I received a lot of questions, so we decided to buy a second unit for public use. It quickly became a very popular service, and now we have both machines set up for public use and we digitize our own materials when they are not booked by the public.
If money and resources were not obstacles for public libraries, what would you like to see public libraries do to help their communities understand personal digital archiving?
Events and classes and printed resources are of course really important. I also think it’s crucial that all public libraries provide the same general guidelines and teach a common set of steps for personal digital archiving, and to help the public understand why continuous management of digital materials is necessary. For most people, archiving your material is an afterthought at best.
I think that one of the major mental hurdles people face is the enormous backlog that most people now have. It’s so important to keep guides and handouts concise and easy to follow, and to teach people how to do at least the bare minimum with the backlog while at the same time taking some proactive steps to prepare new material for the long run.
Do you have any thoughts about how to motivate librarians to give these workshops and keep them going regularly?
In my early conversations I found people to be passionately interested in the issues once they thought about them and I really want to emphasize how interesting these classes are to teach. They’ve also been very well attended, in my experience. It’s not hard to get the ball rolling, once you have a well-designed class package to help you get started.
Personal digital archiving really isn’t difficult, and is such an interesting area with huge potential for library programming and community building. I’m sure these types of classes will become standard at every library at some point.
What do you think the basic things are that every person needs to know about personal digital archiving?
If you want your digital materials to last, they need continuous management and care. If you do nothing to keep your materials organized, safe, and accessible, they will be lost.
What’s so special about libraries?
This is a rhetorical question, as I think libraries are amazing places. But many are dead serious in posing the query these days. To this point the answer has been new services built on top of the tremendous reservoir of goodwill that libraries have accrued over the decades. But technology continues to drive change.
A recent publication, Can’t Buy Us Love: The Declining Importance of Library Books and the Rising Importance of Special Collections, by Rick Anderson, brings this issue into focus. His thesis is simple and persuasive: physical general collections in academic research libraries are in declining use. As these collections become commodities–goods for which demand is broadly supplied without dependence on brand or other differentiation–a longstanding rationale for academic libraries is eroding. To counter this, Anderson argues that academic libraries need to highlight another traditional function: “gathering and curating of rare and unique documents, including primary source materials.”
To my mind the argument extends to local public libraries as well. The circulating books that have long been the bread and butter of public libraries are also becoming commodities. The question comes down to how libraries can best serve their communities as the future unfolds. The American Library Association report, Confronting the Future: Strategic Visions for the 21st-Century Public Library (PDF), considers this issue thoughtfully. One strategic decision public libraries face according to the report is termed “Portal to Archive.” In other words, libraries need to decide to what degree they are a means to connect with the wider information universe as opposed to a place that documents unique topics of interest to the local community.
This to me isn’t even a variable. Any library with public access to the internet and staff with basic familiarity gets fairly close to “portal” capability. On the other hand, a library can play an utterly unique role in collecting, preserving and providing access to one-of-a-kind materials that document the history and heritage of its community. Many public libraries have been doing this for a long time, either individually or in collaboration with other local institutions. This is a compelling activity, particularly if combined with public education outreach programming.
As Anderson notes, the key is rising above commodity status. A library relying on collections that are hard to differentiate from those available elsewhere runs the risk of redundancy. But a library with holdings that are uniquely valuable to its community provides a public service that is difficult, if not impossible, to replace. Plus, the value of the materials–and of the library’s public service–grows with online access.
Another consideration comes into play. Given a large enough scale, linked local collections could emerge as a powerful data collection with national value. As noted in Confronting the Future:
Taken together, the network of thousands of public libraries, each performing this function locally, would establish an unmatched data resource for those with practical, commercial, or academic interests in, say, the real estate values in Connecticut towns during the first decade of the 21st century; the relationships among demographics, educational opportunity, and criminal behavior in small midwestern towns; or any of a multitude of other possible questions that could be answered by accessing data from one or many of these locally based archives.
To answer the opening question, then, I would say that digital special collections are a key part of what makes libraries special.
I found it both truthful and inspiring...
Truthful, because the chaotic path of discovery involved in understanding mysterious digital media reflected my own experiences on similar digital preservation adventures, both for the library and for the AQuA and SPRUCE projects.
Inspiring, because it brought new light to my old concerns about format/software/hardware registry systems. I've long been worried that they have not been designed with their users in mind. Specifically, the users that know all of this information and are willing to spend time sharing it. Why would they do it? What incentive would they need? What form of knowledge sharing would they choose?
Upon reading Ben's article, things became clearer. As I twittered at the time:Now, go through and read it one more time, and think about how such a registry could actually have helped. What would it need to include? [t]Could it really replace the expertise of those five (or so) people? Or should its purpose be to capture and link what they have achieved? [t]Is the answer really in building registries? Or is it better to run more XFR STNs and help document and preserve what they do? [t]Maybe we don't know what information we need? Maybe we don't even know who or what we are building registries for? Are we trying to replace imagination and expertise with an encyclopedia? Is it wrong to focus on the information, and ignore the people? Do we need a registry if we have a community of expertise to rely on? Should that community come first, and then be allowed to build whatever it needs? Maybe running and documenting more events like XFR STN and AQuA/SPRUCE is the only way to find out? Preservation Topics: Format Registry
This is a guest post by Abbie Grotke, Web Archiving Team Lead.
We recently moved to a new house, and my husband, a professional musician, has been working on setting up a music and recording studio upstairs now that we have the room. Alongside the clarinets, saxophones and keyboard sit a desktop computer (with better recording setup), laptop (with better movie editing software) and various digital recording gadgets.
A recent project in the new studio combined an original musical composition and video with clips from a strange and wonderful 1907 silent film from the collections of the moving image archive at Internet Archive. This led to some experimentation with a brand-new-to-him piece of video editing software. Confusion about how files were stored, saved and exported from that software led to not one but TWO, experiences losing his files: I won’t go into the details here but they involved one getting deleted when he tried to copy from his laptop to desktop computer, and another involved overwriting his project file data accidentally. He had to recreate his entire movie project from scratch three times in all. As I heard repeated the soul-crushing tales of woe and horror via text message and emails from home, my response was: “I wish there was a ‘sorry for your data loss’ card I could send you!”
Then the more helpful “working in digital preservation” wife set about to troubleshoot what had happened when I got home (alas, the files could not be retrieved). More importantly I showed him how to make backups of his files-in-progress to avoid having this happen again. Not all was bad – the resulting movie turned out better in the long run than the original, and he definitely learned a lot about how to use the software and save and backup his files moving forward.
But I kept thinking about those cards… in this day and age of consumers producing and storing so much of their own digital content, and the chance of loss pretty high, wouldn’t it be great to be able to send a note to your distressed Grandma, your upset Aunt or overwrought college student? And spread a little message about digital preservation along the way?
It is unlikely that the major greeting card companies will set about producing such cards, so I thought I would take it upon myself to create a few mockups for some Friday fun, using some of the great images from the SPRUCE project’s Digital Preservation Illustrations, which were created for the Digital Preservation Business Case Toolkit. All those years in art school clearly paid off.
And one for the kiddos:
The following is a guest post by Lyssette Vazquez-Rodriguez, Program Support Assistant & Valeria Pina, Communications Assistant, both with OSI at the Library of Congress
The residents have finally arrived! After years of planning, the staff, hosts, and benefactors are thrilled to welcome the 2013-2014 National Digital Stewardship Residency Inaugural Class. The residents arrived on Tuesday, September 3rd, to begin a two-week orientation and workshop at the Library of Congress. In the workshop, they are receiving expert training on various components of digital stewardship before beginning their residencies at their host institutions.
The official inauguration ceremony was held on Wednesday, September 4th in the Montpelier room of the James Madison Building. Guests included representatives from the partnering host institutions, the Library of Congress, and the Institute of Museum and Library Services. George Coulbourne, Executive Program Officer for the Office of Strategic Initiatives, said, “this program is establishing the standard on digital preservation while creating action plans to prevent the obsolescence of traditional formats.”
The Deputy Librarian of Congress, Robert Dizard, Jr., pointed out that NDSR was not only going to help the residents and participating institutions, but would also help the Library identify areas of urgency in the field of digital preservation.
Susan Hildreth, Director of the Institute of Museum and Library Services, mentioned how impressed she was with the popularity of the program among the digital preservation community. She emphasized how the nine-month program, designed for multilateral learning, would grant invaluable understanding of the community’s needs.
The keynote speaker was Dr. Margaret Hedstrom, professor of Information at the University of Michigan, and faculty coordinator of the Archives and Records Management specialization within the Master of Science in Information program. Among her internationally-known research, she led the CAMiLEON project, which investigated the use of emulation tools as part of a strategy for long-term preservation of digital records. Her current research interests include digital preservation, cultural preservation strategies and outreach in developing countries. Dr. Hedstrom emphasized the need to discover innovative solutions for the digital preservation field, and encouraged the residents to continue with their projects. She was especially concerned with the barriers in the digital stewardship profession (digital native vs. digital migrant, analog vs. digital, etc.) and hoped that NDSR would help overcome these obstacles.
During an interview, the residents talked about their excitement to be a part of the program, and to put the skills learned in their respective graduate programs into practice. From different academic and professional backgrounds, all of them found a field they were passionate about. Heidi Dowding, who will be working at Dumbarton Oaks Research Library and Collection, explained how she wants to empower children outside of the classroom. By being a librarian, she realized she can help people by providing access of educational resources to underserved populations. Erica Tikemeyer, who will be working in the Smithsonian Institution Archives, said she got into the field of digital preservation after realizing there was a great need to develop the knowledge and skills of underprivileged areas of the country.
Residents are eager to get to work and bring new and fresh ideas to their host institutions. NDSR is a step towards the development of leaders who will ensure the longevity of digital preservation. More details about the residents’ work at the host institutions will be covered in the next few weeks, but in the meantime, please visit the NDSR website.
Many of our readers may remember a unique blog post written by our former intern, Tess Webre. Tess took a very creative, educational approach to the subject of digital preservation and created Snow Byte and the Seven Formats, A Digital Preservation Fairy Tale.
This post turned out to be so popular (see the many comments), and, it had such visual appeal, that we were inspired to turn it into a video. So, here it is – Snow Byte and the Seven Formats, A Digital Preservation Fairy Tale, the video!
Snow Byte may have a tongue-in-cheek children’s book style, but the idea behind it is to illustrate the overall importance of digital preservation. Hopefully, this technology-oriented “fairy tale” will appeal to young people, as an entertaining way to learn about this topic. As Tess mentioned in her earlier post, children are learning about digital material at a younger and younger age. So this story idea came about as an answer to the question, “what’s a good way to teach them about this topic”?
In the NDIIPP program, we are faced with the same question all the time, but mainly for adults – how can we get more people to pay attention to the increasing problem of digital loss? Since “Snow Byte” also includes such concepts as “metadata schema” and “proprietary file formats”, and highlights the issue of file backup, it could also appeal to library professionals or anyone else looking for a gentle introduction to digital preservation. In other words, Snow Byte is a fairy tale for all ages.
In case you were wondering who does the voices in the video, no, we didn’t happen to have a bona fide theater troupe available. However, this project brought out some otherwise hidden theatrical talent among the staff, who were brought together for this video as the “NDIIPP Radio Players”.
Does Snow Byte manage to evade the evil queen, and retrieve her magic spell? How does Snow Byte avoid digital disaster in the end? And who is this fellow, “Dublin”? Watch the video and find out. And, enjoy!
See “Snow Byte” and our other videos on digital preservation related topics on our video page.
I was at a recent meeting of the Federal Geographic Data Committeee’s Coordination Group and Anne Castle, the Assistant Secretary for Water and Science in the Department of the Interior and the co-chair of the FGDC Steering Committee, was discussing the challenges of finding resources to support geospatial activity. The federal geospatial community is working with a reduced budget (for example, the FGDC recently announced the cancellation of their long-running CAP grant program for FY 2013 and 2014), but a chief concern of the participants was not just shrinking resources for geospatial activity, but the challenge of structuring funding in a way that facilitates and encourages cross-agency collaboration and long-term thinking.
We in the stewardship community are no strangers to this problem, but there’s much we can learn from the experiences of the geospatial community. The geospatial community not only provides governance models for how we might go about our business, but they also are building tools we can tap into to help us tackle stewardship issues across organizations and generations.
The National Digital Stewardship Alliance Geospatial Content subgroup is exploring ways to engage with the wider geospatial community, concentrating recent efforts on opportunities to engage with the federal government’s Geospatial Platform:
“The Geospatial Platform will offer access to a suite of geospatial assets including data, services, applications, and infrastructure that will be known as the geospatial Platform offering…The Geospatial Platform will support an operational environment, www.GeoPlatform.gov, where customers can discover, access, and use shared data, services, applications, and when appropriate, infrastructure assets.” – from Modernization Roadmap for the Geospatial Platform (PDF), pg. 10.
The Platform builds upon existing federal interagency geospatial initiatives to share data, develop collaborative programs and establish standard national datasets. It is a significant effort that has become a chief focus of energy and attention across the federal geospatial data community.
The NDSA Geospatial Content subgroup has held recent discussions with Platform planners, who are eager to help get us engaged. The Platform is still in early development, but several features have been implemented: the ability to explore featured maps; build your own maps based on available data; and create “communities” around common interest areas to share information and maps. The stewardship community might find the third feature particularly interesting, along with these other potential benefits:
- Make historic digital geospatial collections more immediately accessible in a forum with high visibility and a potentially significant user base;
- An infrastructure to house a clearinghouse of information on the stewardship of digital geospatial data;
- Access to advanced tools to create maps and make them accessible, with other technical services (preservation?) coming in the future;
- Access to advanced tools and services without a significant investment in technical infrastructure on the part of any individual organization;
- Redundant storage for some portion of the community’s digital maps;
- Engagement with the broad community of geospatial data creators and users, providing collaborative opportunities;
- A possible central point for digital geospatial data for stewardship capture purposes;
- A venue to explore the role of stewarding organizations in the management of digital geospatial information of long-term value across the entire lifecycle;
- Early adoption provides participants with more dedicated technical support resources and reputational benefits.
Of course, these potential benefits are offset by issues that the digital stewardship community must address before moving forward with any kind of engagement .
First off, our community will need to clarify its purpose(s) for engaging with the Platform or similar activities (the academic-centric OpenGeoportal project has some similarities to the Platform and may offer another outlet for digital stewardship community participation). Do we see the Platform as a clearinghouse for information on geospatial preservation and stewardship, like geopreservation.org but embedded in another community? Or is it most useful as an access point to collections of historical digital geospatial data? Or both?
Who will manage a “historical geodata community” on the Platform? Are there enough interested NDSA members to take on the management of a Platform community, or is it necessary to build a wider coalition of willing participants? Do activities like the Platform provide enough benefit to make the effort to utilize them as a central distribution point for historic digital data?
With all of this in mind, what are your thoughts on engaging with the Platform and activities like it? How can we most effectively marshal our community resources, both within the NDSA and across the wider community, to take advantage of opportunities like this as they arise?
I was staring at a blank screen when my colleague David came into my office. I semi-jokingly asked him for a blog topic.
“I have one for you,” he replied. “Content Archaeology. Discuss.” And with that he left my office.
People know that I trained as an archaeologist and did fieldwork in multiple locations. I still think of myself as a social scientist. This phrase resonates with me, and is a concept that I have discussed with others, more often under the rubric of “digital archaeology.” There is also the practice of using digital tools in archaeology, but that’s for another post.
In researching this, I did a bit of content archaeology myself. In the writing this morphed into a bit of a “Before You Were Born” post as well. This is a VERY truncated list of what one might consider digital archaeology.
- There was a very interesting article on digital archaeology in Wired in 1993. Yes, that’s really 1993.
- I read a very interesting article in the journal Social Semiotics by Gordon Fletcher and Anita Greenhill from 1996 entitled The Social Construction of Electronic Space that explicitly calls out digital archaeology as a methodology for research into virtual communities.
- There’s a UKOLN report titled Digital Archaeology: Rescuing Neglected and Damaged Data Resources by Seamus Ross and Ann Gow from 1999.
- I found a very illuminating paper from 2003 on what it took to reconstruct a set of UK education datasets known as The Schools Census.
- The digital archaeology story that is perhaps the most well-known to the public is the story from 2011 of the recovery of the Domesday Project, and its rebirth online.
- There is the Digital Archaeology project, aiming to recover disruptive moments in design and interactivity on the web. We interviewed Jim Boulton of Story Worldwide on The Signal in 2011.
- Mick Morrison at Flinders University posted an outline for a hands-on workshop on Digital Archaeology in 2011.
- Doug Reside of the New York Public Library wrote on Digital Archaeology: Recovering Your Digital History in 2012.
- I found a great 2013 case study from the University of Pennsylvania Museum of Archaeology and Anthropology in a blog post entitled Digital Archaeology — Uncovering a Website.
- In 2013 the New Museum launched a great experiment called XFR STN to help artists recover and migrate their digital art.
There is some holy grail content that the greater community would love to be found so digital archaeology and preservation actions could be taken, such as the full set of Apollo moon landing 11 tapes or the lost Dr. Who episodes.
How do you define “Content Archaeology” or Digital Archaeology”? What lost content would you like to see recovered?
In spite of my new job, I’m finding some time to work on JHOVE. Version 1.11a1 is now available for testing. Please give it a try and let me know of any problems.
Tagged: JHOVE, preservation, software
When Sam Brylawski was a teenager he had to write a paper for his high school American history class about Gershwin’s “Rhapsody in Blue,” so he did something that was ambitious for a high school student: he traveled to the Library of Congress to examine the composition’s original manuscript in the Gershwin collection.
Brylawski found himself sitting at a table in front of the original manuscript, studying Gershwin’s music-notation “handwriting” – the often-stubby stems on the half notes, the squiggly rests, the hastily sketched but perfectly aligned syncopation and harmony almost bursting off the page. Wayne Shirley, who is a legend in the Library’s Music Division for his scholarship and encyclopedic knowledge, assisted Brylawski and pointed out some especially interesting sections.
“To actually examine a real Gershwin manuscript with Wayne Shirley’s amazing help was a great thrill,” said Brylawski. “Those things worked to get me hooked on the Library of Congress and on libraries in general.”
Hooked enough to work in the Library’s Recorded Sound Section every summer during college. Hooked enough to get a job there after graduating college, to immerse himself so deeply and thoroughly in his work that he would one day become the head of Recorded Sound. And hooked enough to crusade — in the 21st century — for unified action among public and private institutions to preserve and make accessible all recorded sound.
Brylawski, a recognized authority on the history and preservation of recorded sound, learned almost everything on the job, working side-by-side with scholars, talented engineers and recorded-sound savants, experts who get the best possible sound off of every recording medium.
Brylawski started out at the Library as a preservation technician, transferring recordings from disk to tape. Eventually he decided that he didn’t have the “ears” or the technical expertise to do the job the way it needed to be done, so he took a clerical job in the Library’s Recorded Sound Section and Recording Laboratory.
“It was a fabulous education,” said Brylawski. “It was sort of like being an apprentice in a reading room. I would help users look for things that they wanted to copy from the collections and I learned from Library professionals how to serve the public and the fundamentals of library work, as well as where everything was.”
Brylawski became a reference librarian in 1980 and a curator in the early 1990s. In addition to helping people find things, he worked with other staff to make things findable. They indexed unpublished recordings, primarily gift collections held by the Library, using information from the recordings’ engineering notes. This resulted in the Sound Online Inventory and Catalog, a database of over 200,000 recordings.
When James Billington became Librarian of Congress in 1987, one of his first major initiatives was to acquire Congressional funding to help the Library deal with its backlog of unprocessed materials. As a result, Recorded Sound staff and resources increased significantly. A symbol of that commitment is the Library’s National Audio-Visual Conservation Center in Culpeper, VA. Brylawski was on the executive team that planned the Center.
In 1996, Brylawski was chosen to head the Recorded Sound section of the Motion Picture, Broadcasting, and Recorded Sound Division. He said that in the years after his appointment, he observed two major changes.
“One was an increased emphasis on the importance of access,” he said. “And the other was a transition to digital collections and digital preservation.”
The American Memory project gave the public access to thousands of recordings from the stacks. It included some of the first online recorded sound collections from a major cultural institution.
Today the showpiece of online access to the Library’s Recorded Sound collections is the National Jukebox, one of the projects Brylawski devoted his time to after he retired from the Library in 2004. The Library created the Jukebox with Sony Music Entertainment in response to the National Recording Preservation Act of 2000 (which Brylawski contributed to), which states that “The Librarian [of Congress] shall…provide for reasonable access to the sound recordings and other materials in such collection for scholarly and research purposes.”
As for the Library’s transition to digital collections and digital preservation, that has been decades in the making. Digital recording has been around since the 1970s and commercial CDs have been available since the early 1980s. By the 1990s, Recorded Sound preferred CDs as the most reliable playback medium, mainly because CDs do not wear worn down by “playback” as a phonograph needle would wear down a record groove or a magnetic tape deteriorates.
Still, CDs are unreliable for long-term storage. Discs can be easily damaged by handling or by the environment and CD players will become obsolete, just as all media players eventually become obsolete. Besides, CDs are merely containers; the data is what is important.
Audio files are now transferred over the web in different formats and streamed in a variety of ways, and most of the time they are missing crucial metadata. And the Library is challenged to gather and preserve them.
In 2002 Brylawski published a comprehensive report, “The Preservation of Digitally Recorded Sound,” that articulated the complicated, multifaceted challenges involved with preserving recorded sound in the digital age.
He wrote about preserving streaming music and subscription-based music; about the proliferation of CD reissues of old vinyl and tape recordings, which vary in quality; about the explosion of native-born MP3s and their lack of metadata. And he wrote about how, more than ever, copyright can be an obstacle to preservation.
Brylawski is not against copyright. Quite the contrary. His family includes two very prominent copyright attorneys, one who began working with the Library of Congress more than 100 years ago. He appreciates that recorded sound has been a commercial business since its birth in the 19th century.
In the report, he observed that, “Record companies today feel bruised by the rampant swapping of music files…” He wrote about the copyright laws that do not realistically apply to digital preservation and how, in his opinion, those laws may impede the work of cultural institutions in preserving at-risk recorded sound.
Brylawski said, “Regarding copyright this is a interesting and very sensitive time. The music business has been very hard hit in this century. Record sales are way down from 20 years ago. Many in the business blame file-sharing for much of the decline. At the same time, it my personal belief that property holders overplayed their hand when they fought to extend copyright terms in the late 1990s and one result has been a decline in public respect for copyright laws. Librarians need to work with the industry to build collaborations and preserve our audio heritage.”
Given his decades of work with recordings, Brylawski is also painfully aware of the unclaimed orphaned recordings that were copyrighted but not in print and not available for anyone to hear. He wrote about the recordings on decaying media that would be lost forever if action wasn’t taken soon and he said that it is imperative for everyone with an interest and a stake in recorded sound to collaborate on mutually beneficial solutions.
In 2010, Brylawski was a member of one of the six task forces that contributed to the comprehensive report, “The State of Recorded Sound Preservation in the United States: A National Legacy at Risk in the Digital Age,” which was sponsored by the National Recording Preservation Board.
Brylawski said, “The task forces met many times to debate and discuss and share concerns and possible solutions to various aspects of what might go into a national plan of action.”
The report examined the problems in exhaustive detail. Two years later the Library published a national plan of action, “The Library of Congress National Recording Preservation Plan.”
The plan is clear and tightly focused, organized into four main topic areas:
- Building the National Sound Recording Preservation Infrastructure
- Blueprint for Implementing Preservation Strategies
- Promoting Broad Public Access for Educational Purposes
- Long-Term National Strategies
Each topic area breaks down into a few sub-topics and within those are specific, practical recommendations for action. One recommendation is the call for education in digital audio preservation.
“There are few courses taught in audio preservation or preservation courses that touch on audio,” said Brylawski. “But there is no degree program in Preservation Management of Audio. And we hope that there will be. Also, the sands are shifting, so continuing education is necessary for preservation administrators and engineers.
“There is also the challenge of debriefing the classic preservation engineers who have techniques they have developed. We can tap and preserve their knowledge. There is a great deal of legacy knowledge that we are very concerned about losing as people leave the profession. Or worse, die. The National Recording Preservation Board is funding the Association for Recorded Sound Collections in doing some video oral histories of great engineers.”
Brylawski is concerned that his reports recommendations may not be reaching out far enough to the local level, to smaller institutions, community orchestras, private collectors and others in the music business that might not be aware of the long-term threat to their collections or may not have the resources to archive their collections properly. He suspects there may be a vast quantity of recorded sound collections at large and at risk and he is helping develop methods of outreach and making resources easily accessible online.
Brylawski never slowed his pace after retirement. After he left the Library in 2004, he was appointed Editor and Co-Director of the Encyclopedic Discography of Victor Recordings, by the University of California, Santa Barbara. He is also chair of the Library’s National Recording Preservation Board.
“I have had a long interest in discography,” said Brylawski. “Comprehensive discographies are needed to study and fully understand recorded music and spoken word history. In addition, a discography can assist in cataloging and preservation planning — the latter by reducing redundancy. ”
Brylawski is obviously fervent and committed to what he does and he is reverent about recordings. When he described to me the early acoustic recordings — where musicians played altogether into a single acoustic cone that cut the recording directly onto a disc — his voice sounded awed as he referred to them as “snapshots of time.”
And in this new century, several long decades after Brylawski’s transformative experience at the Library of Congress researching the Gershwin manuscript, he had a hand in making accessible online — with the consent of all the stakeholders, for anyone and everyone to enjoy — a recording of Gershwin performing his “Rhapsody in Blue”.
The September 2013 Library of Congress Digital Preservation Newsletter (PDF) is now available.
In this issue:
- The Truth and Reconciliation Commission of Canada using the Levels of Digital Preservation
- Find out about the George Sanger Collection at UT Austin Videogame Archive
- Read an Analysis of Current Digital Preservation Policies
- What Is It That We Actually DO (at the Library of Congress)?
- Recent Interviews with: Matthew G. Kirschenbaum from the University of Maryland and Jason Scott from the Archive Team
- New and recently updated resources: The Digital Preservation Business Case Toolkit; The Activists’ Guide to Archiving Video; Digital Preservation Videos for the Classroom; Digital Preservation in a Box; Rich Online Resources Documenting the 1963 March on Washington
- Other news: Help Pick Panels for the 2014 South By Southwest Conference; Xporting Digital Format Sustainability Descriptions as XML; Format Migration and More Launching Points for Applied Research
- Upcoming Events: National Book Festival, Sept. 21-22, Washington, DC; Cultural Heritage Archives: Networks, Innovation & Collaboration Symposium, Sept. 26-27, Washington, DC; 2013 DLF Forum, Nov. 3-6, Austin, TX; Best Practices Exchange, Nov. 13-15, Salt Lake City, UT; Aligning National Approaches to Digital Preservation: An Action Assembly, Nov. 18-20, Barcelona, Spain
It’s time to declare Files that Last a flop.
Most books are flops. This wouldn’t be so bad in itself, but I produced the book with the help of a Kickstarter campaign. If you were one of my supporters, you wanted and expected something good. It’s clear from the lack of reviews and sales that I didn’t deliver. It ranks #271,981 on the Kindle best-seller list.
I wish I knew why. I got a few negative comments in private communication, but nothing deeply disappointed. There were the inevitable complaints about typos, even after proofreading. No one ever catches them all, and I don’t think that was the problem. There were a couple of complaints about omissions of favorite topics; that’s inevitable too. There were a few very enthusiastic public comments. What there wasn’t was any real reviews, any ratings on websites that offered the books, any discussion. Since I published the book, there hasn’t been a single comment on this blog other than blocked spam. Reviews are the life of a book, and FTL was DOA. People didn’t hate it; it just didn’t generate enough enthusiasm to get people to say anything about it publicly.
People do want a book on “digital preservation for everygeek.” I wouldn’t have gotten the support that I got on Kickstarter without that. What I delivered somehow wasn’t what you wanted. I hope this doesn’t discourage anyone else from making the effort, with more engaging writing, more relevant content, or whatever it was I didn’t provide.
As for me, on to other things. I may as well “remainder” the book, so here’s a Smashwords coupon code that’s good for 60% off (on the Smashwords site only) till the end of 2015: XY29D. Post it wherever you think people might be interested.
Thanks once again to my Kickstarter supporters, and to Matt and Terri for their work in making it a better book.