Feed aggregator

Evaluation of DICOM data ingest (without copying data to archiving system)

SCAPE Wiki Activity Feed - 31 July 2014 - 10:39am

Page edited by Tomasz Parkola

View Online | Add Comment Tomasz Parkola 2014-07-31T10:39:55Z
Categories: SCAPE

Performance depending on search criteria

OPF Wiki Activity Feed - 31 July 2014 - 10:28am

Page edited by Tomasz Parkola

View Online | Add Comment Tomasz Parkola 2014-07-31T10:28:01Z

Performance depending on search criteria

SCAPE Wiki Activity Feed - 31 July 2014 - 10:28am

Page edited by Tomasz Parkola

View Online | Add Comment Tomasz Parkola 2014-07-31T10:28:01Z
Categories: SCAPE

Digital Preservation 2014: It’s a Thing

The Signal: Digital Preservation - 30 July 2014 - 12:56pm

“Digital preservation makes headlines now, seemingly routinely. And the work performed by the community gathered here is the bedrock underlying such high profile endeavors.” – Matt Kirschenbaum

 Erin Engle.

The registration table at Digital Preservation 2014. Photo credit: Erin Engle.

The annual Digital Preservation meeting, held each summer in Washington, DC, brings together experts in academia, government and the private and non-profit sectors to celebrate key work and share the latest developments, guidelines, best practices and standards in digital preservation.

Digital Preservation 2014, held July 22-24,  marked the 13th major meeting hosted by NDIIPP in support of the broad community of digital preservation practitioners (NDIIPP held two meetings a year from 2005-2007), and it was certainly the largest, if not the best. Starting with the first combined NDIIPP/National Digital Stewardship Alliance meeting in 2011, the annual meeting has rapidly evolved to welcome an ever-expanding group of practitioners, ranging from students to policy-makers to computer scientists to academic researchers. Over 300 people attended this year’s meeting.

“People don’t need drills; they need holes,” stated NDSA Coordinating Committee chairman Micah Altman, the Director of Research at the Massachusetts Institute of Technology Libraries,  in an analogy to digital preservation in his opening talk. As he went on to explain, no one needs digital preservation for its own sake, but it’s essential to support the rule of law, a cumulative evidence base, national heritage, a strategic information reserve, and to communicate to future generations. It’s these challenges that face the current generation of digital stewardship practitioners, many of which are addressed in the 2015 National Agenda for Digital Stewardship, which Altman previewed during his talk (and which will appear later this fall).

 Erin Engle.

A breakout session at Digital Preservation 2014. Photo credit: Erin Engle.

One of those challenges is the preservation of the software record, which was eloquently illuminated by Matt Kirschenbaum, the Associate Director of the Maryland Institute for Technology in the Humanities, during his stellar talk, “Software, It’s a Thing.” Kirschenbaum ranged widely across computer history, art, archeology and pop culture with a number of essential insights. One of the more piquant was his sorting of software into different categories of “things” (software as asset, package, shrinkwrap, notation/score, object, craft, epigraphy, clickwrap, hardware, social media, background, paper trail, service, big data), each of which with its own characteristics. As Kirschenbaum eloquently noted, software is many different “things,” and we’ll need to adjust our future approaches to preservation accordingly.

Associate Professor at the New School Shannon Mattern took yet another refreshing approach, discussing the aesthetics of creative destruction and the challenges of preserving ephemeral digital art. As she noted, “by pushing certain protocols to their extreme, or highlighting snafus and ‘limit cases’ these artists’ work often brings into stark relief the conventions of preservation practice, and poses potential creative new directions for that work.”

 Erin Engle.

Stephen Abrams, Martin Klein, Jimmy Lin and Michael Nelson during the “Web Archiving” panel. Photo credit: Erin Engle.

These three presentations on the morning of the first day provided a thoughtful intellectual substrate upon which a huge variety of digital preservation tools, services, practices and approaches were elaborated over the following days. As befits a meeting that convenes disparate organizations and interests, collaboration and community were big topics of discussion.

A Tuesday afternoon panel on “Community Approaches to Digital Stewardship” brought together a quartet of practitioners who are working collaboratively to advance digital preservation practice across a range of organizations and structures, including small institutions (the POWRR project); data stewards (the Research Data Alliance); academia (the Academic Preservation Trust); and institutional consortiums (the Five College Consortium).

Later, on the second day, a well-received panel on the “Future of Web Archiving” displayed a number of clever collaborative approaches to capturing the digital materials from the web, including updates on the Memento project and Warcbase, an open-source platform for managing web archives.

 Erin Engle.

CurateCamp: Digital Culture. Photo credit: Erin Engle.

In between there were plenary sessions on stewarding space and research data, and over three dozen lightning talks, posters and breakout sessions covering everything from digital repositories for museum collections to a Brazilian digital preservation network to the debut of a new digital preservation questions and answers tool. Additionally, a CurateCamp unconference on the topic of “Digital Culture” was held on a third day at Catholic University, thanks to the support of the CUA Department of Library and Information Science.

The main meeting closed with a thought-provoking presentation from artist and digital conservator Dragan Espenschied. Espenschied utilized emulation and other novel tools to demonstrate some of the challenges related to presenting works authentically, in particular works from the early web and those dependent on a range of web services. Espenschied, also the Digital Conservator at Rhizome, has an ongoing project, One Terabyte of Kilobyte Age, that explores the material captured in the Geocities special collection. Associated with that project is a Tumblr he created that automatically generates a new screenshot from the Geocities archive collection every 20 minutes.

Web history, data stewardship, digital repositories; for digital preservation practitioners it was nerd heaven. Digital preservation 2014, it’s a thing. Now on to 2015!

Categories: Planet DigiPres

EVAL ARC2WARC-HDP w.o.Tika

OPF Wiki Activity Feed - 30 July 2014 - 10:06am

Page edited by Rune Bruun Ferneke-Nielsen

View Online | Add Comment Rune Bruun Ferneke-Nielsen 2014-07-30T10:06:52Z

EVAL ARC2WARC-HDP w.o.Tika

SCAPE Wiki Activity Feed - 30 July 2014 - 10:06am

Page edited by Rune Bruun Ferneke-Nielsen

View Online | Add Comment Rune Bruun Ferneke-Nielsen 2014-07-30T10:06:52Z
Categories: SCAPE

Art is Long, Life is Short: the XFR Collective Helps Artists Preserve Magnetic and Digital Works

The Signal: Digital Preservation - 29 July 2014 - 2:44pm

XFR STN (“Transfer Station”) is a grass-roots digitization and digital-preservation project that arose as a response from the New York arts community to rescue creative works off of aging or obsolete audiovisual formats and media. The digital files are stored by the Library of Congress’s NDIIPP partner the Internet Archive and accessible for free online. At the recent Digital Preservation 2014 conference, the NDSA gave XFR STN the NDSA Innovation Award. Last month, members of the XFR collective — Rebecca Fraimow, Kristin MacDonough, Andrea Callard and Julia Kim — answered a few questions for the Signal.

"VHS 1" from XFR Collective.

“VHS 1,” courtesy of Walter Forsberg.

Mike: Can you describe the challenges the XFR Collective faced in its formation?

XFR: Last summer, the New Museum hosted a groundbreaking exhibit called XFR STN.  Initiated by the artist collective Colab and the resulting MWF Video Club, the exhibit was a major success. By the end of the exhibition over 700 videos had been digitized with many available online through the Internet Archive.

It was clear  for all of us involved that there was a real demand for these services, that there are many under-served artists who were having difficulty preserving and accessing their own media. Many of the people involved with the exhibit became passionate about continuing the service of preserving obsolete magnetic and digital media for artists.  We wanted to offer a long-term, non-commercial, grassroots solution.

Using the experience of working on XFR STN as a jumping-off point, we began developing XFR Collective as a separate nonprofit initiative to serve the need that we saw.  Over the course of our development, we’ve definitely faced — and are still facing — a number of challenges in order to make ourselves effective and sustainable.

"VHS 3" by XFR Collective.

“VHS 2,” courtesy of Walter Forsberg.

Perhaps the biggest challenge has simply been deciding what form XFR Collective was going to take.  We started out with a bunch of borrowed equipment and a lot of enthusiasm, so the one thing we knew we could do was digitize, but we had to sit down and really think about things like organizational structure, sustainable pricing for our services, and the convoluted process of becoming a non-profit.

Eventually, we settled on a membership-based structure in order to be able to keep our costs as low as possible.  A lot of how we’re operating is still very experimental — this summer wraps up our six-month test period, during which we limited ourselves to working with only a small number of partners to allow us to figure out what our capacity was and how we could design our projects in the future.

We’ve got a number of challenges still ahead of us — finding a permanent home is a big one — and we still feel like we’re only just getting started, in terms of what we can do for the community of artists who use our services.  It’s going to be interesting for all of us to see how we develop.  We’ve started thinking of ourselves as kind of a grassroots preservation test kitchen. We’ll try almost any kind of project once to see if it works!

Mike: Where are the digital files stored? Who maintains them?

XFR: Our digital files will be stored with the membership organizations and uploaded to the Internet Archive for access and for long-term open-source preservation.  This is an important distinction that may confuse some people: XFR Collective is not an archive.

While we advocate and educate about best practices, we will not hold any of the digital files ourselves; we just don’t have the resources to maintain long-term archival storage.  We encourage material to go onto the Internet Archive because long-term accessibility is part of our mission and because the Internet Archive has the server space to store uncompressed and lossless files as well as access files.  That way if something happens to the storage that our partners are using for their own files, they can always re-download them.  But we can’t take responsibility for those files ourselves. We’re a service point, not a storage repository.

"VHS 2" by XFR Collective

“VHS 3,” courtesy of Walter Forsberg.

Mike: Regarding public access as a means of long-term preservation and sustainability, how do you address copyrighted works?

XFR: This is a great question that confounds a lot of our collaborators initially.  Access-as-preservation creates a lot of intellectual property concerns.  Still, we’re a very small organization, so we can afford to take more risks than a more high-profile institution.  We don’t delve too deeply into the area of copyright; our concern is with the survival of the material.  If someone has a complaint, the Internet Archive will give us a warning in time to re-download the content and then remove it. But so far we haven’t had any complaints.

Mike: What open access tools and resources do you use?

XFR: The Internet Archive itself is something of an open access resource and we’re seeing it used more and more frequently as a kind of accessory to preservation, which is fantastic.  Obviously it’s not the only solution, and you wouldn’t want to rely on that alone any more than you would any kind of cloud storage, but it’s great to have a non-commercial option for streaming and storage that has its own archival mission and that’s open to literally anyone and anything.

Mike:  If anyone is considering a potential collaboration to digitally preserve audiovisual artwork, what can they learn from the experiences of the XFR Collective?

XFR: Don’t be afraid to experiment!  A lot of what we’ve accomplished is just by saying to ourselves that we have to start doing something, and then jumping in and doing it.  We’ve had to be very flexible. A lot of the time we’ll decide something as a set proposition and then find ourselves changing it as soon as we’ve actually talked with our partners and understood their needs.  We’re evolving all the time but that’s part of what makes the work we do so exciting.

We’ve also had a lot of help and we couldn’t have done any of what we’ve accomplished without support and advice from a wide network of individuals, ranging from the amazing team at XFR STN to video archivists across New York City.  None of these collaborations happen in a vacuum, so make friendships, make partnerships, and don’t be nervous about asking for advice.  There are a lot of people out there who care about video preservation and would love to see more initiatives out there working to make it happen.

Categories: Planet DigiPres

The MH17 Crash and Selective Web Archiving

The Signal: Digital Preservation - 28 July 2014 - 4:34pm

The following is a guest post by Nicholas Taylor, Web Archiving Service Manager for Stanford University Libraries.

//web.archive.org/web/20140717155720/https://vk.com/wall-57424472_7256">Internet Archive Wayback Machine</a>.

Screenshot of 17 July 2014 15:57 UTC archive snapshot of deleted VKontakte Strelkov blog post regarding downed aircraft, on Internet
Archive Wayback Machine
.

The Internet Archive Wayback Machine has been mentioned in several news articles within the last week  (see here, here and here) for having archived a since-deleted blog post from a Ukrainian separatist leader touting his shooting down a military transport plane which may have actually been Malaysia Airlines Flight 17. At this early stage in the crash investigation, the significance of the ephemeral post is still unclear, but it could prove to be a pivotal piece of evidence.

An important dimension of the smaller web archiving story is that the blog post didn’t make it into the Wayback Machine by the serendipity of Internet Archive’s web-wide crawlers; an unknown but apparently well-informed individual identified it as important and explicitly designated it for archiving.

Internet Archive crawls the Web every few months, tends to seed those crawls from online directories or compiled lists of top websites that favor popular content, archives more broadly across websites than it does deeply on any given website, and embargoes archived content from public access for at least six months. These parameters make the Internet Archive Wayback Machine an incredible resource for the broadest possible swath of web history in one place, but they don’t dispose it toward ensuring the archiving and immediate re-presentation of a blog post with a three-hour lifespan on a blog that was largely unknown until recently.

Recognizing the value of selective web archiving for such cases, many memory organizations engage in more targeted collecting. Internet Archive itself facilitates this approach through its subscription Archive-It service, which makes web archiving approachable for curators and many organizations. A side benefit is that content archived through Archive-It propagates with minimal delay to the Internet Archive Wayback Machine’s more comprehensive index. Internet Archive also provides a function to save a specified resource into the Wayback Machine, where it immediately becomes available.

Considering the six-month access embargo, it’s safe to say that the provenance of everything that has so far been archived and re-presented in the Wayback Machine relating to the five-month-old Ukraine conflict is either the Archive-It collaborative Ukraine Conflict collection or the Wayback Machine Save Page Now function. In other words, all of the content preserved and made accessible to date, including the key blog post, reflects deliberate curatorial decisions on the part of individuals and institutions.

A curator at the Hoover Institution Library and Archives with a specific concern for the VKontakte Strelkov blog actually added it to the Archive-It collection with a twice-daily capture frequency at the beginning of July. Though the key blog post was ultimately recorded through the Save Page Now feature, what’s clear is that subject area experts play a vital role in focusing web archiving efforts and, in this case, facilitated the preservation of a vital document that would not otherwise have been archived.

At the same time, selective web archiving is limited in scope and can never fully anticipate what resources the future will have wanted us to save, underscoring the value of large-scale archiving across the Web. It’s a tragic incident but an instructive example of how selective web archiving complements broader web archiving efforts.

Categories: Planet DigiPres

Song identification on GitHub

File Formats Blog - 24 July 2014 - 11:42am

The code for my song identification “nichesourcing” web application is now available on GitHub. It’s currently aimed at one project, as I’d mentioned in my earlier post, but has potential for wide use. It allows the following:

  • Users can register as editors or contributors. Only registered users have access.
  • Editors can post recording clips with short descriptions.
  • Contributors can view the list of clips and enter reports on them.
  • Reports specify type of sound, participants, song titles, and instruments. Contributors can enter as much or as little information as they’re able to.
  • Editors can modify clip metadata, delete clips, and delete reports.
  • Contributors and editors can view reports.
  • More features are planned, including an administrator role.

This is my first PHP coding project of any substance, so I’m willing to accept comments about my overall coding approach. It’s inevitable that, to some degree, I’m writing PHP as if it’s Java. If there are any standard practices or patterns I’m overlooking, let me know.


Tagged: music, software, songid
Categories: Planet DigiPres

Understanding the Participatory Culture of the Web: An Interview with Henry Jenkins

The Signal: Digital Preservation - 24 July 2014 - 10:51am
Henry Jenkins, Provost Professor of Communication, Journalism, and Cinematic Arts, a joint professorship at the USC Annenberg School for Communication and the USC School of Cinematic Arts.

Henry Jenkins, Provost Professor of Communication, Journalism, and Cinematic Arts, with USC Annenberg School for Communication and the USC School of Cinematic Arts.

The following is a guest post from Julia Fernandez, this year’s NDIIPP Junior Fellow. Julia has a background in American studies and working with folklife institutions and is working on a range of projects related to CurateCamp Digital Culture. This is part of an ongoing series of interviews Julia is conducting to better understand the kinds of born-digital primary sources folklorists, and others interested in studying digital culture, are making use of for their scholarship.

Anyone who has ever liked a TV show’s page on Facebook or proudly sported a Quidditch t-shirt knows that being a fan goes beyond the screen or page.  With the growth of countless blogs, tweets, Tumblr gifsets, Youtube videos, Instagram hashtags, fanart sites and fanfiction sites, accessing fan culture online has never been easier. Whether understood as a vernacular web or as the blossoming of a participatory culture individuals across the world are using the web to respond to and communicate their own stories.

As part of the NDSA Insights interview series, I’m delighted to interview Henry Jenkins, professor at the USC Annenberg School for Communication and self-proclaimed Aca-Fan. He is the author of one of the foundational works exploring fan cultures, “Textual Poachers: Television Fans and Participatory Culture,”  as well as a range of other books, including “Convergence Culture: Where Old and New Media Collide,” and most recently the co-author (with Sam Ford and Joshua Green) “Spreadable Media: Creating Value and Meaning in a Networked Culture.” He blogs at Confessions of an Aca-Fan.

Julia: You state on your website that your time at MIT, “studying culture within one of the world’s leading technical institutions” gave you “some distinctive insights into the ways that culture and technology are reshaping before our very eyes.”  How so? What are some of the changes you’ve observed, from a technical perspective and/or a cultural one?

Henry: MIT was one of the earliest hubs in the Internet. When I arrived there in 1989, Project Athena was in its prime; the MIT Media Lab was in its first half decade and I was part of a now legendary Narrative Intelligence Reading Group (PDF) which brought together some of the smartest of their graduate students and a range of people interested in new media from across Cambridge; many of the key thinkers of early network culture were regular speakers at MIT; and my students were hatching ideas that would become the basis for a range of Silicon Valley start ups. And it quickly became clear to me that I had a ringside seat for some of the biggest transfomations in the media landscape in the past century, all the more so because through my classes, the students were helping me to make connections between my work on fandom as a participatory culture and a wide array of emerging digital practices (from texting to game mods).

Kresge Auditorium, MIT, Historic American Buildings Survey/Historic American Engineering Record/Historic American Landscapes Survey, Library of Congress Prints and Photographs Division, http://hdl.loc.gov/loc.pnp/hhh.ma1361/photos.080151

Kresge Auditorium, MIT, Historic American Buildings Survey/Historic American Engineering Record/Historic American Landscapes Survey, Library of Congress Prints and Photographs Division, http://hdl.loc.gov/loc.pnp/hhh.ma1361/photos.080151

Studying games made sense at MIT because “Spacewar,” one of the first known uses of computers for gaming, had been created by the MIT Model Railroad club in the early 1960s. I found myself helping to program a series that the MIT Women’s Studies Program was running on gender and cyberspace, from which the materials for my book, “From Barbie to Mortal Kombat” emerged. Later, I would spend more than a decade as the housemaster of an MIT dorm, Senior House, which is known to be one of the most culturally creative at the Institute.

Through this, I was among the first outside of Harvard to get a Facebook account; I watched students experimenting with podcasting, video-sharing and file-sharing. Having MIT after my name opened doors at all of the major digital companies and so I was able to go behind the scenes as some of these new technologies were developing, and also see how they were being used by my students in their everyday lives.

So, through the years, my job was to place these developments in their historical and cultural contexts — often literally as Media Lab students would come to me for advice on their dissertation projects, but also more broadly as I wrote about these developments through Technology Review, the publication for MIT’s alumni network. It was there where many of the ideas that would form “Convergence Culture” were first shared with my readers. And the students that came through the Comparative Media Studies graduate program have been at ground zero for some of the key developments in the creative industries in recent years — from the Veronica Mars Kickstarter campaign to the community building practices of Etsy, from key developments in the games and advertising industry to cutting edge experiments in transmedia storytelling. The irony is that I had been really reluctant about accepting the MIT job because I suffer from fairly serious math phobia. :-)

Today, I enjoy another extraordinary vantage point as a faculty member at USC, who is embedded in both the Annenberg School of Communication and Journalism and the Cinema School, and thus positioned to watch how Hollywood and American journalism are responding to the changes that networked communication have forced upon them. I am able to work with future filmmakers who are trying to grasp a shift from a focus on individual stories to an emphasis on world-building, journalists who are trying to imagine new relationships with their publics, and activists who are seeking to make change by any media necessary.

Julia: Much of your work has focused on reframing the media audience as active and creative participants in creating media, rather than passive consumers.  You’ve critiqued use of the terms “viral” and “memes” to describe  internet phenomena as “stripping aside the concept of human agency,” and that the biological language “confuses the actual power relations between producers, properties, brands and consumers.” Can you unpack some of your critiques for us? What is at stake?

Henry: At the core of “Spreadable Media” is a shift in how media travels across the culture. On the one hand, there is distribution as we have traditionally understood it in the era of mass media where content flows in patterns regulated by decisions made by major corporations who control what we see, when we see it and under what conditions. On the other hand, there is circulation, a hybrid system, still shaped top-down by corporate players, but also bottom-up by networks of everyday people, who are seeking to move media that is meaningful to them across their social networks, and will take media where they want it when they want it through means both legal and illegal. The shift towards a circulation-based model for media access is disrupting and transforming many of our media-related practices, and it is not explained well by a model which relies so heavily on metaphors of infection and assumptions of irrationality.

The idea of viral media is a way that the broadcasters hold onto the illusion of their power to set the media agenda at a time when that power is undergoing a crisis. They are the ones who make rational calculations, able to design a killer virus which infects the masses, so they construct making something go viral as either arcane knowledge that can be sold at a price from those in the know or as something that nobody understands, “It just went viral!” But, in fact, we are seeing people, collectively and individually, make conscious decisions about what media to pass to which networks for what purposes with what messages attached through which media channels and we are seeing activist groups, religious groups, indie media producers, educators and fans make savvy decisions about how to get their messages out through networked communications.

Julia: Cases like the Harry Potter Alliance suggest the range of ways that fan cultures on the web function as a significant cultural and political force. Given the significance of fandom, what kinds of records of their online communities do you think will be necessary in the future for us to understand their impact? Said differently, what kinds of records do you think cultural heritage organizations should be collecting to support the study of these communities now and into the future?

Henry: This is a really interesting question. My colleague, Abigail De Kosnik at UC-Berkeley, is finishing up a book right now which traces the history of the fan community’s efforts to archive their own creative output over this period, which has been especially precarious, since we’ve seen some of the major corporations which fans have used to spread their cultural output to each other go out of business and take their archives away without warning or change their user policies in ways that forced massive numbers of people to take down their content.

Image of Paper Print Films in Library of Congress collection.

Image of Paper Print Films in Library of Congress collection. Jenkins notes this collection of prints likely makes it easier to write the history of the first decade of American cinema than to write the history of the first decade of the web.

The reality is that it is probably already easier to write the history of the first decade of American cinema, because of the paper print collection at the Library of Congress, than it is to write the history of the first decade of the web. For that reason, there has been surprisingly little historical research into fandom — even though some of the communication practices that fans use today go back to the publication practices of the Amateur Press Association in the mid-19th century. And even recently, major collections of fan-produced materials have been shunted from library to archive with few in your realm recognizing the value of what these collections contain.

Put simply, many of the roots of today’s more participatory culture can be traced back to fan practices over the last century. Fans have been amongst the leading innovators in terms of the cultural uses of new media. But collecting this material is going to be difficult: fandom is a dispersed but networked community which does not work through traditional organizations; there are no gatekeepers (and few recordkeepers) in fandom, and the scale of fan production — hundreds of thousands if not millions of new works every year — dwarfs that of commercial publishing. And that’s to focus only on fan fiction and would does not even touch the new kinds of fan activism that we are documenting for my forthcoming book, By Any Media Necessary. So, there is an urgent need to archive some of these materials, but the mechanisms for gathering and appraising them are far from clear.

Julia: Your New Media Literacy project aims in part to “provide adults and youth with the opportunity to develop the skills, knowledge, ethical framework and self-confidence needed to be full participants in the cultural changes which are taking place in response to the influx of new media technologies, and to explore the transformations and possibilities afforded by these technologies to reshape education.” In one of your pilot programs, for instance, students studied “Moby-Dick” by updating the novel’s Wikipedia page. Can you tell us a little more about this project? What are some of your goals? Further, what opportunities do you think libraries have to enable this kind of learning?

Henry: We documented this project through our book, “Reading in a Participatory Culture,” and through a free online project, Flows of Reading. It was inspired by the work of Ricardo Pitts-Wiley, the head of the Mixed Magic Theater in Rhode Island, who was spending time going into prisons to get young people to read “Moby-Dick” by getting them to rewrite it, imagining who these characters would be and what issues they would be confronting if they were part of the cocaine trade in the 21st century as opposed to the whaling trade in the 19th century. This resonated with the work I have been doing on fan rewriting and fan remixing practices, as well as what we know about, for example, the ways hip hop artists sample and build on each other’s work.

So, we developed a curriculum which brought together Melville’s own writing and reading practices (as the master mash-up artist of his time) with Pitts-Wiley’s process in developing a stage play that was inspired by his work with the incarcerated youth and with a focus on the place of remix in contemporary culture. We wanted to give young people tools to think ethically and meaningfully about how culture is actually produced and to give teachers a language to connect the study of literature with contemporary cultural practices. Above all, we wanted to help students learn to engage with literary texts creatively as well as critically.

We think libraries can be valuable partners in such a venture, all the more so as regimes of standardized testing make it hard for teachers to bring complex 19th century novels like “Moby-Dick” into their classes or focus student attention on the process and cultural context of reading and writing as literacy practices. Doing so requires librarians to think of themselves not only as curators of physical collections but as mentors and coaches who help students confront the larger resources and practices opened up to them through networked communication. I’ve found librarians and library organizations to be vital partners in this work through the years.

Julia: Your latest book is on the topic of “spreadable media,” arguing that “if it doesn’t spread, it’s dead.”  In a nutshell, how would you define the term “spreadable media”?

Henry:  I talked about this a little above, but let me elaborate. We are proposing spreadable media as an alternative to viral media in order to explain how media content travels across a culture in an age of Facebook, Twitter, YouTube, Reddit, Tumblr, etc. The term emphasizes the act of spreading and the choices which get made as people appraise media content and decide what is worth sharing with the people they know. It places these acts of circulation in a cultural context rather than a purely technological one. At the same time, the word is intended to contrast with older models of “stickiness,” which work on the assumption that value is created by locking down the flow of content and forcing everyone who wants your media to come to your carefully regulated site. This assumes a kind of scarcity where we know what we want and we are willing to deal with content monopolies in order to get it.

But, the reality is that we have more media available to us today that we can process: we count on trusted curators — primarily others in our social networks but also potentially those in your profession — to call media to our attention and the media needs to be able to move where the conversations are taking place or remain permanently hidden from view. That’s the spirit of “If it doesn’t spread, it’s dead!” If we don’t know about the media, if we don’t know where to find it, if it’s locked down where we can’t easily get to it, it becomes irrelevant to the conversations in which we are participating. Spreading increases the value of content.

Julia: What does spreadable media mean to the conversations libraries, archives and museums could  have with their patrons? How can archives be more inclusive of participatory culture?

Henry:  Throughout the book, we use the term “appraisal” to refer to the choices everyday people make, collectively and personally, about what media to pass along to the people they know. Others are calling this process “curating.” But either way, the language takes us immediately to the practices which used to be the domain of “libraries, archives, and museums.” You were the people who decided what culture mattered, what media to save from the endless flow, what media to present to your patrons. But that responsibility is increasingly being shared with grassroots communities, who might “like” something or “vote something up or down” through their social media platforms, or simply decide to intensify the flow of the content through tweeting about it.

We are seeing certain videos reach incredible levels of circulation without ever passing through traditional gatekeepers. Consider “Kony 2012,” which reached more than 100 million viewers in its first week of circulation, totally swamping the highest grossing film at the box office that week (“Hunger Games”) and the highest viewed series on American television (“Modern Family”), without ever being broadcast in a traditional sense. Minimally, that means that archivists may be confronting new brokers of content, museums will be confronting new criteria for artistic merit, and libraries may be needing to work hand in hand with their patrons as they identify the long-term information needs of their communities. It doesn’t mean letting go of their professional judgement, but it does mean examining their prejudices about what forms of culture might matter and it does mean creating mechanisms, such as those around crowd-sourcing and perhaps even crowd-funding, which help to insure greater responsiveness to public interests.

Julia: You wrote in 2006 that there is a lack of fan involvement with works of high culture because “we are taught to think about high culture as untouchable,” which in turn has to do with “the contexts within which we are introduced to these texts and the stained glass attitudes which often surround them.” Further, you argue that this lack of a fan culture makes it difficult to engage with a work, either intellectually or emotionally. Can you expand on this a bit? Do you still believe this to be the case, or has this changed with time? Does the existence of transformative works like “The Lizzie Bennet Diaries” on Youtube or vibrant Austen fan communities on Tumblr reveal a shift in attitudes? Finally, how can libraries, museums, and other institutions help foster a higher level of emotional and intellectual engagement?

Henry:  Years ago, I wrote “Science Fiction Audiences” with the British scholar John Tulloch in which we explored the broad range of ways that fans read and engaged with “Star Trek” and “Doctor Who.” Tulloch then went on to interview audiences at the plays of Anton Checkov and discovered a much narrower range of interpretations and meanings — they repeated back what they had been taught to think about the Russian playwright rather than making more creative uses of their experience at the theater. This was probably the opposite of the way many culture brokers think about the high arts — as the place where we are encouraged to think and explore — and popular arts — as works that are dummied down for mass consumption. This is what I meant when I suggested that the ways we treat these works cut them off from popular engagement.

At the same time, I am inspired by recent experiments which merge the high and the low. I’ve already talked about Mixed Magic’s work with “Moby-Dick,” but “The Lizzie Bennett Diaries” is another spectacular example. It’s inspired to translate Jane Austen’s world through the mechanisms of social media: gossip and scandal plays such a central role in her works; she’s so attentive to what people say about each other and how information travels through various social communities. And the playful appropriation and remixing of “Pride and Prejudice” there has opened up Austen’s work to a whole new generation of readers who might otherwise have known it entirely through Sparknotes and plodding classroom instruction. There are certainly other examples of classical creators — from Gilbert and Sullivan to Charles Dickens and Arthur Conan Doyle — who inspire this kind of fannish devotion from their followers, but by and large, this is not the spirit with which these works get presented to the public by leading cultural institutions.

I would love to see libraries and museums encourage audiences to rewrite and remix these works, to imagine new ways of presenting them, which make them a living part of our culture again. Lawrence Levine’s “Highbrow/Lowbrow” contrasts the way people dealt with Shakespeare in the 19th century — as part of the popular culture of the era — with the ways we have assumed across the 20th century that an appreciation of the Bard is something which must be taught because it requires specific kinds of cultural knowledge and specific reading practices. Perhaps we need to reverse the tides of history in this way and bring back a popular engagement with such works.

Julia: You’re a self-described academic and fan, so I’d be interested in what you think are some particularly vibrant fan communities online that scholars should be paying more attention to.

 A Vlogbrothers FAQ”

Screenshot of the VlogBrothers, Hank and John Green, as they display a symbol of their channel in a video titled “How To Be a Nerdfighter: A Vlogbrothers FAQ”

Henry: The first thing I would say is that librarians, as individuals, have long been an active presence in the kinds of fan communities I study; many of them write and read fan fiction, for example, or go to fan conventions because they know these as spaces where people care passionately about texts, engage in active debates around their interpretation, and often have deep commitments to their preservation. So, many of your readers will not need me to point out the spaces where fandom are thriving right now; they will know that fans have been a central part of the growth of the Young Adult Novel as a literary category which attracts a large number of adult readers so they will be attentive to “Harry Potter,” “Hunger Games,” or the Nerdfighters (who are followers of the YA novels of John Green); they will know that fans are being drawn right now to programs like “Sleepy Hollow” which have helped to promote more diverse casting on American television; and they will know that now as always science fiction remains a central tool which incites the imagination and creative participation of its readers. The term, Aca-Fan, has been a rallying point for a generation of young academics who became engaged with their research topics in part through their involvement within fandom. Whatever you call them, there needs to be a similar movement to help librarians, archivists and curators come out of the closet, identify as fans, and deploy what they have learned within fandom more openly through their work.

Categories: Planet DigiPres

So long, Microsoft! UK government abandons Office, embraces free-to-use ... - Expert Reviews

Google News Search: "new file format" - 23 July 2014 - 11:55am

Expert Reviews

So long, Microsoft! UK government abandons Office, embraces free-to-use ...
Expert Reviews
When consultation into a new file format for government departments opened earlier this year Microsoft wanted its own Open XML format to be included. "While including ODF is a choice that Microsoft supports, ignoring and omitting OpenXML will ensure ...

and more »Google News
Categories: Technology Watch

Future Steward on Stewardship’s Future: An Interview with Emily Reynolds

The Signal: Digital Preservation - 23 July 2014 - 10:44am
Emily Reynolds, Winner of 2014 Future Steward NDSA Innovation Award.

Emily Reynolds, Winner of 2014 Future Steward NDSA Innovation Award.

Each year, the NDSA Innovation Working Group reviews nominations from members and non-members alike for the Innovation Awards. Most of those awards are focused on recognizing individuals, projects and organizations that are at the top of their game.

The Future Steward award is a little different. It’s focused on emerging leaders, and while the recipients of the future steward award have all made significant accomplishments and achievements, they have done so as students, learners and professionals in the early stages of their careers. Mat Kelly’s work on WARCreate, Martin Gengebach’s work on forensic workflows and now Emily Reynolds work in a range of organizations on digital preservation exemplify how some of the most vital work in digital preservation is being taken on and accomplished by some of the newest members of our workforce.

I’m thrilled to be able to talk with Emily, who picked up this year’s Future Steward award yesterday during the Digital Preservation 2014 meeting, about the range of her work and her thoughts on the future of the field. Emily was recognized for the quality of her work in a range of internships and student positions with the Interuniversity Consortium for Political and Social Research, the University of Michigan Libraries, the Library of Congress, Brooklyn Historical Society, StoryCorps, and, in particular, her recent work on the World Bank’s eArchives project.

Screenshot of the Arab American National Museum's web archive collections.

Screenshot of the Arab American National Museum’s web archive collections.

Trevor: You have a bit of experience working with web archives at different institutions; scoping web archive projects with the Arab American National Museum, putting together use cases for the Library of Congress and in your coursework at the University of Michigan. Across these experiences, what are your reflections and thoughts on the state of web archiving for cultural heritage organizations?

Emily: It seems to me that many cultural heritage organizations are still uncertain as to where their web archive collections fit within the broader collections of their organization. Maureen McCormick Harlow, a fellow National Digital Stewardship Resident, often spoke about this dynamic; the collections that she created have been included in the National Library of Medicine’s general catalog. But for many organizations, web collections are still a novelty or a fringe part of the collections, and aren’t as discoverable. Because we’re not sure how the collections will be used, it’s difficult to provide access in a way that will make them useful.

I also think that there’s a bit of a skills gap, in terms of the challenges that web archiving can present, as compared to the in-house technical skills at many small organizations. Tools like Archive-It definitely lower the barrier to entry, but still require a certain amount of expertise for troubleshooting and understanding how the tool works. Even as the tools get stronger, the web becomes more and more complex and difficult to capture, so I can’t imagine that it will ever be a totally painless process.

Trevor: You have worked on some very different born-digital collections, processing born-digital materials for StoryCorps in New York and on a TRAC self-audit at ICPSR, one of the most significant holders of social science data sets. While very different kinds of materials, I imagine there are some similarities there too. Could you tell us a bit about what you did and what you learned working for each of these institutions? Further, I would be curious to hear what kinds of parallels or similarities you can draw from the work.

41-580x386

Image of a StoryCorps exhibit at the New Museum which Emily participate in.

Emily: At StoryCorps, I did a lot of hands-on work with incoming interviews and data, so I saw first-hand the amount of effort that goes into making such complex collections discoverable. Their full interviews are not currently available online, but need to be accessible to internal staff. At ICPSR, I was more on the policy side of things, getting an overview of their preservation activities and documenting compliance with the TRAC standard.

StoryCorps and ICPSR are an interesting pair of organizations to compare because there are some striking similarities in the challenges they face in terms of access. The complexity and variety of research data held by ICPSR requires specialized tools and standards for curation, discovery and reuse. Similarly, oral history interviews can be difficult to discover and use without extensive metadata (including, ideally, full transcripts). They’re specialized types of content, and both organizations have to be innovative in figuring out how to preserve and provide access to their collections.

ICPSR has a strong infrastructure and systems for normalizing and documenting the data they ingest, but this work still requires a great deal of human input and quality control. Similarly, metadata for StoryCorps interviews is input manually by staff. I think both organizations have done great work towards finding solutions that work for their individual context, although the tools for providing access to research data seem to have developed faster than those for oral history. I’m hopeful that with tools like Pop Up Archive that will change.

Trevor: Most recently, you’ve played a leadership role in the development of the World Bank’s eArchives project. Could you tell us about this project a little and suggest some of the biggest things you learned from working on it?

Julia Blase and Emily Reynolds present on “Developing Sustainable Digital Archive Systems.” at ALA 2013 Midwinter Meeting. Photo by Jaime McCurry.

Emily: The eArchives program is an effort to digitize the holdings of the World Bank Group Archives that are of greatest interest to researchers. We don’t view our digitization as a preservation action (only insofar as it reduces physical wear and tear on the records), and are primarily interested in providing broader access to the records for our international user base. We’ve scanned around 1500 folders of records at this point, prioritizing records that have been requested by researchers and cleared for public disclosure through the World Bank’s Access to Information Policy.

The project has also included a component of improving the accessibility of digitized records and archival finding aids. We are in the process of launching a public online finding aid portal, using the open-source Access to Memory (AtoM) platform, which will contain the archives’ ISAD(G) finding aids as well as links to the digitized materials. Previously, the finding aids were contained in static HTML pages that needed to be updated manually; soon, the AtoM database will sync regularly with our internal description database. This is going to be a huge upgrade for the archivists, in terms of reducing duplication of work and making their efforts more visible to the public.

It’s been really interesting to collaborate with the archives staff throughout the process of launching our AtoM instance. I’ve been thinking a lot about how compliance with archival standards can actually make records less accessible to the public, since the practices and language involved in finding aids can be esoteric and confusing to an outsider. It has been an interesting balance to ensure that the archivists are happy with the way the descriptions are presented, while also making the site as user-friendly as possible. Anne-Marie Viola, of Dumbarton Oaks, has written a couple of blog posts about the process of conducting usability testing on their AtoM instance, which have been a great resource for me.

Trevor: As I understand it, you are starting out a new position as a program specialist with the Institute for Museum and Library Services. I realize you haven’t started yet, but could you tell us a bit about what you are going to be doing? Along with that, I would be curious to hear you talk a bit about how you see your experience thus far fitting into working for the federal funding for libraries and museums?

Emily: As a Program Specialist, I’ll be working in IMLS’s Library Discretionary Programs division, which includes grant programs like the Laura Bush 21st Century Librarian Program and the National Leadership Grants for Libraries. Among other things, I will be supporting the grant review process, communicating with grant applicants, and coordinating grant documentation. I’ll also have the opportunity to participate in some of the outreach that IMLS does with potential and existing grant applicants.

Even though I haven’t been in the profession for a very long time, I’ve had the opportunity to work in a lot of different areas, and as a result feel that I have a good understanding of the broad issues impacting all kinds of libraries today. I’m excited that I’ll be able to be involved in a variety of initiatives and areas, and to increase my involvement in the professional community. I’ve also been spoiled by the National Digital Stewardship Residency’s focus on professional development, and am excited to be moving on to a workplace where I can continue to attend conferences and stay up-to-date with the field.

Trevor: Staffing is a big concern for the future of access to digital information. The NDSA staffing survey gets into a lot of these issues. Based on your experience, what words of advice would you offer to others interested in getting into this field? How important do you think particular technical capabilities are? What made some of your internships better or more useful than others? What kinds of courses do you think were particularly useful? At this point you’ve graduated among a whole cohort of students in your program. What kinds of things do you think made the difference for those who had an easier time getting started in their careers?

Emily: I believe that it is not the exact technical skills that are so important, but the ability to feel comfortable learning new ones, and the ability to adapt what one knows to a particular situation. I wouldn’t expect every LIS graduate to be adept at programming, but they should have a basic level of technical literacy. I took classes in GIS, PHP and MySQL, Drupal and Python, and while I would not consider myself an expert in any of these topics, they gave me a solid understanding of the basics, and the ability to understand how these tools can be applied.

I think it’s also important for recent graduates to be flexible about what types of jobs they apply for, rather than only applying for positions with “Librarian” or “Archivist” in the title. The work we do is applicable in so many roles and types of organizations, and I know that recent grads who were more flexible about their search were generally able to find work more quickly. I enjoyed your recent blog post on the subject of digital archivists as strategists and leaders, rather than just people who work with floppy discs instead of manuscripts. Of course this is easy for me to say, as I move to my first job outside of archives – but I think I’ll still be able to support and participate in the field in a meaningful way.

Categories: Planet DigiPres

EaaS: Image and Object Archive — Requirements, Implementation and Example Use-Cases

Open Planets Foundation Blogs - 23 July 2014 - 10:33am
bwFLA's Emulation-as-a-Service makes emulation widely available for non-experts and could prove emulation as a valuable tool in digital preservation workflows. Providing these emulation services to access preserved and archived digital objects poses further challenges to data management. Digital artifacts are usually stored and maintained in dedicated repositories and object owners want to – or are required to – stay in control over their intellectual property. This article discusses the problem of managing virtual images, i.e. virtual harddisks bootable by an emulator, and derivatives thereof but the solution proposed can be applied to any digital artifact.RequirementsOnce a digital object is stored in an archive and an appropriate computing environment has been created for access, this environment should be immutable and should not be modified except explicitly by an administrational interface. This guarantees that a memory institution's digital assets are unaltered by the EaaS service and remain available in the future. Immutability, however, is not easy to handle for most emulated environments. Just booting the operating system may change an environment in unpredictable ways. When the emulated software writes parts of this data and reads it again, however, it probably expects the read data to represent the modifications. Furthermore, users that want to interact with the environment should be able to change or customize it. Therefore, data connectors have to provide write access for the emulation service while they cannot write the data back to the archive. The distributed nature of the EaaS approach requires an  efficient network transport of data to allow for immediate data access and usability. However, digital objects stored in archives can be quite large in size. When representing a hard disk image, the installed operating system together with installed software can easily grow up to several GBs in size. Even with today's network bandwidths, copying these digital objects in full to the EaaS service may take minutes and affects the user experience. While the archived amount of data is usually large, the data that is actually accessed frequently can be very small. In a typical emulator scenario, read access to virtual hard disk images is block-aligned and only very few blocks are actually read by the emulated system. Transferring only these blocks instead of the whole disk image file is typically more efficient, especially for larger files. Therefore, the network transport protocol has to support random seeks and sparse reads without the need for actually copying the whole data file. While direct filesystem access provides these features if a digital object is locally available to the EaaS service, such access it is not available in the general case of separate emulation and archive servers that are connected via the internet.ImplementationThe Network Block Device (NBD) protocol provides a simple client/server architecture that allows direct access to single digital objects as well as random access to the data stream within these objects. Furthermore, it can be completely implemented in userspace and does not require a complex software infrastructure to be deployed to the archives.  In order to access digital objects, the emulation environment needs to reference these objects in the emulation environment. Individual objects are identified in the NBD server by using unique export names. While the NBD URL schema directly identifies the digital object and the archive where the digital object can be found, the data references are bound to the actual network location. In a long-term preservation scenario, where emulation environments, once curated, should last longer than a single computer system that acts as the NBD server, this approach has obvious drawbacks. Furthermore, the cloud structure of EaaS allows for interchanging any component that participates in the preservation effort, thus allowing for load balancing and fail-safety. This advantage of distributed systems is offset by static, hostname-bound references.Handle It!To detach the references from the object's network location, the Handle System is used as persistent object identifier throughout our reference implementation. The Handle System provides a complete technological framework to deal with these identifiers (or "Handles'' (HDL) in the Handle System) and constitutes a federated infrastructure that allows the resolution of individual Handles using decentralized Handle Services. Each institution that wants to participate in the Handle System is assigned a prefix and can host a Handle Service. Handles are then resolved by a central resolver by forwarding requests to these services according to the Handle's prefix. As the Handle System, as a sole technological provider, does not pose any strict requirements to the data associated with Handles, this system was used as a PI technology.Persistent User Sessions and DerivativesAs digital objects (in this case the virtual disk image) are not to be modified directly in the archive by the EaaS service, a mechanism to store modifications locally  while reading unchanged data from the archive has to be implemented. Such a transparent write mechanism can be achieved using a copy-on-write access strategy. While NBD allows for arbitrary parts of the data to be read upon request, not requiring any data to be provided locally, data that is written through the data connector is tracked and stored in a local data structure. If a read operation requests a part of data that is already in this data structure, the previously changed version of the data should be returned to the emulation component. Similarly, parts of data that are not in this data structure were never modified and must be read from the original archive server. Over time, a running user session has its own local version of the data, but only those parts of data that were written are actually copied. We used the qcow2 container format from the QEMU project to keep track of local changes to the digital object. Besides supporting copy-on-write, it features an open documentation as well as a widely used and tested reference implementation with a comprehensive API, the QEMU Block Driver. The qcow2 format allows to store all changed data blocks and the respective metadata for tracking these changes in a single file. To define where the original blocks (before copy-on-write) can be found, a backing file definition is used. The Block Driver API provides a continuous view on this qcow2 container,  transparently choosing either the backing file or the copy-on-write data structures as source. This mechanism allows modifications of data to be stored separately and independent from the original digital object during an EaaS user session, allowing to keep every digital object in its original state as it was preserved  Once the session has finished, these changes can be retrieved from the emulation component and used to create a new, derived data object. As any Block Driver format is allowed in the backing file of a qcow2 container, the backing file can also be a qcow2 container again. This allows „chaining" a series of modifications as copy-on-write files that only contain the actually modified data. This greatly facilitates efficient storage of derived environments as a single qcow2 container can directly be used in a binding without having to combine the original data and the modifications to a  consolidated stream of data. However, this makes such bindings rely not only on the availability of the qcow2 container with the modifications, but also on the original data the qcow2 container refers to. Therefore, consolidation is still possible and directly supported by the tools that QEMU provides to handle qcow2 files. Once the data modifications and the changed emulation environment are retrieved after a session, both can be stored again in an archive to make this derivate environment available. Only those chunks of data that actually  were changed by the user have to be retrieved. These, however, reference and  remain dependent on the original, unmodified digital object. The derivate can then be accessed like any other archived environment. Since all derivate environments contain (stable) references to their backing files, modifications can be stored in  a different image archive, as long as the backing file is available. Therefore, each object owner is in charge for providing storage for individualized system environments but is also  able to protect its modification without loosing the benefits of shared base images. Examples and Use-CasesTo provide a better understanding of the image archive implementation, the following three use-cases demonstrate how the current implementation works. Firstly, a so called derivative is created, a tailored system environment suitable to render a specific object. In a second scenario, a container object (CD-ROM) is injected into the environment which is then modified for object access, i.e. installation of a  viewer application and adding the object to the autostart folder. Finally, an existing harddisk image (e.g. an image copy of a real machine) is ingested into the system. This last case requires, besides the technical configuration of the hardware environment, private files to be removed before public access.Derivatives – Tailored Runtime EnvironmentsTypically, an EaaS provider provides a set of so-called base images. These images contain a basic OS installation which has been configured to be run on a certain emulated platform. Depending on the user's requirements, additional software and/or configuration may be required, e.g. the installation of certain software frameworks or text processing or image manipulation software. This can be done by uploading or making available a software installation package. On our current demo instance this is done either by uploading individual files or a CD ISO image. Once the software is installed the modified environment can be saved and made accessible for object rendering or similar purposes. Object Specific CustomizationIn case of complex CD-ROM objects with rich multimedia content from the 90s and early 2000s, e.g. encyclopedias and teaching software, typically a custom viewer application has to be installed to be able to render its content. For these objects, an already prepared environment (installed software, autostart of the application) would be useful and would surely improve the user experience during access as „implicit“ knowledge on using an outdated environment is not required anymore to make use of the object. Since the number of archived media is large, duplicating for instance a Microsoft Windows environment for every one of them would add a few GBs of data to each object. Usually, neither the object’s information content nor the current or expected user demand justify these extra costs. Using derivatives of base images, however, only a few MBs are required for each customized environment since only changed parts of the virtual image are to be stored for each object. In the case of the aforementioned collection of multimedia CD-ROMs, the derivate size varies between 348KBs and 54MBs.  Authentic Archiving and Restricted Access to Existing ComputersSometimes it makes sense to preserve a complete user system like the personal computer of Vilèm Flusser in the Vilèm Flusser Archive. Such complete system environments usually can be achieved by creating a hard disk image of the existing computer and use this image as the virtual hard disk for EaaS. Such hard disk images can, however, contain personal data of the computer's owner. While EaaS aims at providing interactive access to complete software environments, it is impossible to restrict this "interactiveness", e.g. to forbid access to a certain directory directly from the user interface. Instead, our approach to this problem is to create a derivative work with all the personal data being stripped from the system. This allows users with sufficient access permissions (e.g. family or close friends) to access the original system including personal data, while the general public access only sees a computer with all the personal data removed.Conclusion

With our distributed architecture and an efficient network transport protocol, we are able to provide Emulation as a Service quite efficiently while at the same time allowing owners of digital objects to remain in complete control over their intellectual property. Using copy-on-write technology it is possible to create a multitude of different configurations and flavors of the same system with only minimal storage requirements. Derivatives and their respective "parent" system can be handled completely independent from each other and withdrawing access permissions for a parent will automatically invalidate all existing derivatives. This allows for a very efficient and flexible handling of curation processes that involve the installation of (licensed) software, personal information and user customizations.

Open Planets members can test aforementioned features using the bwFLA demo instance. Get the password here: http://wiki.opf-labs.org/display/PT/bwFLA+test+demo+instance

Taxonomy upgrade extras: EaaSPreservation Topics: Emulation
Categories: Planet DigiPres

Archiving video

File Formats Blog - 19 July 2014 - 10:59am

Suppose you see a cop beating someone up for jaywalking, or you’re stopped at one of the Border Patrol’s internal checkpoints. You’ve got your camera, phone, or tablet, so you make a video record of the incident. What do you do next? The Activists’ Guide to Archiving Video has some solid advice. Its purpose is to help you “make sure that the video documentation you have created or collected can be used for advocacy, as evidence, for education or historical memory – not just now but into the future.” The advice is solid, and most of it applies to any video recording that has long-term importance. In essence, it’s the same advice you’d get from Files that Last or from the Library of Congress. It includes considerations that especially apply to sensitive video, such as encryption and information that might put people at risk, but it’s a valuable addition to anyone’s digital preservation library.

There’s a PDF version of the guide for people who don’t like hopping around web pages. Versions in Spanish and Arabic are also provided.


Tagged: metadata, preservation, video
Categories: Planet DigiPres

Estonia to adopt new digital document format in 2015 - The Baltic Course

Google News Search: "new file format" - 18 July 2014 - 9:04am

The Baltic Course

Estonia to adopt new digital document format in 2015
The Baltic Course
Starting 1 January 2015, a new file format BDOC will become valid in Estonia for digitally signing documents, which has been developed according to the international ETSI (European Telecommunications Standards Institute) standards, writes LETA/Eesti ...

Categories: Technology Watch

A VM4C3PO

SCAPE Blog Posts - 17 July 2014 - 2:36pm

We have just set up a vagrant environment for C3PO. It starts a headless vm where the C3PO related functionalities (Mongodb, Play, a downloadable commandline jar) are managable from the host's browser. Further, the vm itself has all relevant processes configured at start-up independently from vagrant, so it can be, once created, downloaded and used as a stand-alone C3PO vm. We think this could be a scenario applicable to other SCAPE projects as well. The following is a summary of the ideas we've had and the experiences we've made.

The Result

The Vagrantfile and a directory containing all vagrant-relevant files live directly in the root directory of the C3PO repository. So after installing Vagrant and cloning the repository a simple 'vagrant up' should do all the work, as downloading the base box, installing the necessary software and booting the new vm.

After a few minutes one should have a running vm that is accessible from the hosts browser at localhost:8000. This opens a central welcome page that contains information about the vm-specific aspects and links to the playframework's url (localhost:9000) and the Mongodb admin interface (localhost:28017). It also provides a download link for the command-line jar, which has to be used in order to import data. This can be used from the outside of the vm as the Mongodb port is mapped as well. So I can import and analyse data with C3PO without having to fiddle through the setup challenges myself, and, believe me, that way can be long and stony.

The created image is self-contained in that sense that, if I put it on a server, anyone who has Virtualbox installed can download it and use it, without having to rely on vagrant working on their machine.

General Setup

The provisioning script has a number of tasks:

  • it downloads all required dependencies for building the C3PO environment
  • it installs a fresh C3PO (from /vagrant, which is the shared folder connection between the git repository and the vm) and assembles the command-line app
  • it installs and runs a Mongodb server
  • it installs and runs the Playframework
  • it creates a port-forwarded static welcome page with links to all the functionalities above
  • it adds all above to the native ubuntu startup (using /etc/rc.local, if necessary), so that an image of the vm can theoretically be run independently from the vagrant environment

These are all trivial steps, but it can make a difference not having to manually implement all of them.

Getting rid of proxy issues

In case you're behind one of those very common NTLM company proxies, you'll really like that the only thing you have to provide is a config script with some some details around your proxy. If the setup script detects this file, it will download the necessary software and configure maven to use it. Doing it in this way has been actually the first time I got maven running smoothly on a linux VM behind our proxy.

Ideas for possible next steps

There is loads left to do, here are a few ideas:

  • provide interesting initial test-data that ships with the box, so that people can play around with C3PO without having to install/import anything at all.
  • why not having a vm for more SCAPE projects? we could quickly create a repository for something like a SCAPE base vm configuration that is useable as a base for other vms. The central welcome page could be pre-configured (SCAPE branded) as well as all the proxy- and development-environment-related stuff mentioned above.
  • I'm not sure about the sustainablity of shell provisioning scripts with increasing complexity of the bootstrap process. Grouping the shell commands in functions is certainly an improvement, it might be worth though to check out other, more dynamic provisioners. One I find particularly interesting is Ansible.
  • currently there's no way of testing that the vm works with the current development trunk; a test environment that runs the vm and tests for all the relevant connection bits would be handy

 

Preservation Topics: SCAPE
Categories: SCAPE