Planet DigiPres

Getting JHOVE2 to build

File Formats Blog - 27 February 2013 - 1:58am

There’s a private beta, which should soon be public, of a digital preservation area on StackExchange.com. I took advantage of my invitation to it to ask about something that had stalled me a while ago when I tried to download and build JHOVE2. A quick reply told me that the needed change is simple, just one line in the pom.xml file. I can’t link to my question and the answer on Stack Exchange, since a login is required to view it, but it turns out this issue had already been brought up in a JHOVE2 ticket. The discussion indicates some confusion about whether the issue has been fixed in the main JHOVE2 repository, but Sheila Morrissey has a fork on Bitbucket with the fix.

The fix is to change the URL for “JBoss Repository” in pom.xml to the following:

<url>https://repository.jboss.org/nexus/content/repositories/thirdparty-releases/</url >

Kevin Clarke, who provided the answer, recommends building with the following command line to avoid error messages in the tests:

mvn -DskipTests=true install


Tagged: jhove2, software
Categories: Planet DigiPres

5th BlogForever Consortium Meeting & Workshop

blog forever - 26 February 2013 - 7:00pm

DSC00171 300x168 5th BlogForever Consortium Meeting & WorkshopThe fifth BlogForever Consortium Meeting took place during 11-12 February 2013 in Istanbul, Turkey.

In addition to the meeting, technical partners participated in a full day workshop focused on the development of the BlogForever platform.

 

Categories: Planet DigiPres

Forestalling Personal Digital Doom

The Signal: Digital Preservation - 26 February 2013 - 3:42pm

The following is a guest post by Tess Webre, an intern with NDIIPP at the Library of Congress

Even though I’m relatively new to the professional library world, I’ve long known that digital preservation on a personal level is a daunting task laced with threats of doom. It requires great amounts of time, energy and consideration with very few visible benefits from day to day.

Blue screen of death, by wlef70 on Flickr

Like organizing a closet, or rearranging a kitchen cabinet, personal digital archiving is easy to put off, easy to forget and easy to make excuses for avoiding. It is tempting to look at disorganized files and pantries and shrug them off by saying “I’ll work on this tomorrow.”  How quickly we realize the error of our ways.

As soon as a disaster occurs and we find ourselves surrounded in disarray and must pick up the pieces. Just as kitchen cabinets need to be organized after a bag of rice spills on the floor, we start thinking about digital archiving after we can’t find a file, or access a piece of software.

Who among us has not lost a piece of valuable digital information? Our data can go out with a bang of a laptop being dropped or the whimper of scratch on a CD. We can lose our data with the excitement of graduation or the mourning of a funeral. Personal files can be lost with the closing of a company or the changing of a leader.

For reasons as diverse as natural disasters to software obsolescence, accessing our data can be impossible. We can discover the inaccessibility of the data immediately, months or even years after it occurs and feel the same paralyzing inability to do anything about it. Is there anything more frustrating than this?

It seems to me that there are two paths following the loss of data. Option one: consider this to be a fluke, and continue on with day to day life without making any of the necessary changes to preserve data. By ignoring the greater implications, the data loss will be repeated. It is just a matter of time before more and more data becomes lost.  It will start as a trickle, and become a flood. Until responsibility is taken, this will become a cycle.

Option two: learn the implications of this data loss: if one piece of digital data can easily become inaccessible then any piece of data can become inaccessible. This puts every photograph, every email, every home video, every e-filing and every other bit of memory in your digital history on the proverbial chopping block. That is a terrifying possibility, but not an inevitability as long as steps are taken to actively preserve digital data.

However, taking the next step can be difficult. With the relative newness of digital preservation and the plethora of different storage facilities and types, one can easy get lost in a technical jargon, and ambiguity. It can lead to more questions than answers.

What exactly is metadata and how do I make sure I have enough? Is it better to preserve material in the cloud or a physical storage media? As the questions mount, so does the frustration. It becomes easy to just give up on trying to preserve personal data for the future, to continue to put off the necessary steps until the next disaster occurs, thus starting the whole process over again.

One of the things I’ve learned during my internship is just how useful the NDIIPP personal digital archiving information is for helping people do the right thing with personal files. Written in a clear and relaxed tone (and, really, I’m not trying to curry favor!) I’ve found the website, as well as blog posts by Mike Ashenfelder and others to answer all kinds of questions.

I wish I could testify that the NDIIPP information has totally dispelled my fear of impending digital doom. It hasn’t. But it has inspired me with ideas and insights I can apply to push my personal digital threat level back to a safer place.

In the  meantime, I wish you all safe data.

 

Categories: Planet DigiPres

Personal Digital Archiving 2013: “Everyone Should Have an Archive”

The Signal: Digital Preservation - 25 February 2013 - 7:05pm

You should have an archive for your personal digital materials. We all should. Archives preserve memories of “me” as well as “us.” Our personal archives also offer exciting new ways to remember and reconstruct our lives.

The author with George "The Fat Man" Sanger at PDA2013

The author with George “The Fat Man” Sanger at PDA2013, by wlef70, on Flickr

Attendees of the Personal Digital Archiving 2013 conference considered these ideas over two days of presentations and discussion last week. NDIIPP co-sponsored the event with The Maryland Institute for Technology in the Humanities at the University of Maryland in conjunction with the University of Maryland Libraries. This was the first time the conference has taken place on the East coast; it originated in 2010 at the Internet Archive in San Francisco, which also hosted meetings in 2011 and 2012.

A major focus of the meeting was on creators of personal digital materials. The opening keynote was from Sally Bedell Smith, best-selling author of Elizabeth the Queen: The Life of a Modern Monarch and other biographies. Smith discussed her evolving use of word processing from embracing one of the early programs in the 1980s through the present day.

Over the course of her career, she has built a very large hybrid personal collection of paper research and digital drafts. She noted a recent adoption of Facebook in association with her most recent book, and speculated on the future usefulness of information received through social media. But, from an evidentiary perspective, Smith lamented the decline in hand-written correspondence, saying that email had an unfortunate uniformity that “resulted in the loss of vital artifactual information, such as the pressure of the pen on the paper.”

"Inside the PC" display at UMD MITH

“Inside the PC” display at UMD MITH, by wlef70, on Flickr

George “The Fat Man” Sanger also spoke from the creative perspective during the day two keynote. Sanger has composed music for over 200 video games, and was the first video game music producer  accepted into the National Recording Academy. He has donated an extensive digital collection documenting his life and career to the UT Videogame Archive at the Briscoe Center, University of Texas at Austin.

Sanger talked about his creative process, which has depended on an evolving digital infrastructure. He also discussed working with archivists to curate his material. The experience of having his collection preserved and organized for use deeply impressed him. “Everyone should have an archive,” he declared. He urged archivists and librarians to make the process as easy as possible while acknowledging that creators also needed to lower their own barriers that get in the way of preservation and stewardship.

The meeting emphasized use of personal digital collections to explore memories and analyze the past. Sudheendra Hangal, in “Engaging users with personal archives through gamification,” discussed using free software to process personal email, including creation of crossword puzzles to jog memory and also to work with Alzheimers patients. Smiljana Antonijevic and Ellysa Stern Cahoy presented “Scholarly workflow and personal digital archiving,” which detailed their work to interview faculty about their digital creation and management processes. Similarly, Jenny Shaw talked about interviewing creators in “Hardware and soft skills: surveying scientific personal papers in the digital age.” She spoke of her work in the context of documenting the Human Genome Archive Project at the Wellcome Trust in the UK.

Mat Kelly of the Old Dominion University Web Science and Digital Libraries Research Group has prepared detailed notes (and videos) for many of the conference presentations. Please check them out for more information.

Categories: Planet DigiPres

Two new affiliate members join the Open Planets Foundation

Open Planets Foundation Blogs - 25 February 2013 - 12:13pm

We are pleased to welcome two new affiliate organisations to our membership: Portico, a digital preservation service, and the School of Information and Library Science (SILS) at the University of North Carolina at Chapel Hill, a number one ranked school in the United States.

“We are delighted these organisations have decided to join us,” said Bram van der Werf, Executive Director of OPF. “Both organisations have a significant impact on the digital preservation practice on a global scale. The UNC brings expertise in research and education, and Portico for digital preservation services. As members of OPF, UNC and Portico will both play a major role in the further development of tools which are relevant for the OPF community.” Both Portico and SILS are enthusiastic to be a part of OPF. "Portico values OPF’s focus on practical solutions and its emphasis on a broad-based and active community of practise. We share OPF's belief that the digital preservation community as a whole benefits from the wide-spread sharing of experience, tools, and techniques” said Kate Wittenberg, Managing Director of Portico. “I am very excited that SILS is joining the OPF and will be helping to extend the reach of the very successful OPF model and community into the US,” said Christopher (Cal) Lee, Associate Professor at SILS. SILS is the first iSchool to join the OPF, and Portico the first digital preservation service organisation.The OPF now has three member organisations from the US. The current list of members may be viewed at: http://openplanetsfoundation.org/members. For more information about UNC SILS, visit: http://sils.unc.edu/. For more information about Portico, visit: http://www.portico.org/digital-preservation/.Preservation Topics: Open Planets Foundation
Categories: Planet DigiPres

Question: Who is/isn't retaining technical provenance notes?

Open Planets Foundation Blogs - 25 February 2013 - 2:15am

If you are, what are you retaining and why?

If not, why not?

There is more to come from us on this topic - but for now I'd love hear any opinions / thoughts.

And what do I mean by technical provenance?

Good question. I mean any filename sanitation, or QA changes to (meta)data, or any file structure moves, or normalisation data or details of any technical process that has touched the original bitstream as it was found (at rest, if applicable) on its source medium.

Preservation Topics: Preservation Risks
Categories: Planet DigiPres

Pre-publication FTL now available for backers

Files That Last - 23 February 2013 - 11:55pm

The pre-publication version of Files that Last is now done, and instructions for downloading it have been sent to all backers at the $25 level and higher. If you didn’t get that message and think you should have, let me know.

This week I attended the Personal Digital Archiving conference at the University of Maryland and was glad to see how digital preservation is starting to catch on with non-library people! The things I learned there resulted in a few last-minute additions to the book.

Thanks to all of you who backed Files that Last at any level. All of you helped to make it possible. It’s still on target for an April release.


Categories: Planet DigiPres

Born Digital Folklore and the Vernacular Web: An Interview with Robert Glenn Howard

The Signal: Digital Preservation - 22 February 2013 - 3:02pm

Robert Glenn Howard

What do pet cloning websites, YouTube videos of fans playing AC/DC’s “Gone Shootin’”, and discussions of the end times on UseNet all have in common? Answer: Robert Glenn Howard has studied and written about all of them in his ongoing study of the vernacular web. Robert Glenn Howard is the Director of Digital Studies and a Professor in the Department of Communication Arts at the University of Wisconsin – Madison. He is also the editor of the journal Western Folklore. In this installment of Insights, the NDSA innovation working group’s interview series, I am excited to chat with Robert about his approach to studying the web as a folklorist.

Trevor: You come from a folklore background, you edit a folklore journal, but you study online communities. I imagine most folks would think that studying folklore is about studying the long past and that studying online communities would be basically the opposite of that. How did you come to bring a folklore perspective to studying the web and what do you think that perspective does?

Rob: First off, thanks so much for giving me the chance to talk to you about this stuff!

And that is a great question; I wish it weren’t true, but a lot of people imagine “folklore” as “old stuff.” But that just isn’t the case. When the word was invented, it meant the stuff people shared back then—so we think of old stories like Cinderella as being folklore.  And at that time there was this idea that true folklore was disappearing because of modernization—but now we know that folklore changes, but it doesn’t ever disappear because its really just any stuff (the “lore”) that people (the “folk”) share.  A better definition might be that folklore is the informally shared knowledge that we perceive as connecting us to each other.

Its true, some folklorists study “old” folklore in the sense of studying historical things–like archives of folk tales told in the 1800s for example—but, just like any subject really, there is “new” folklore too.  And probably more folklorists study folklore that is being actively shared today than archival stuff.

As far as the Internet goes—its not really an option; if you want to study the folklore we have right now, you have to study it where it is practiced—and, for better or worse, network communication is major place we find people sharing folklore now; since the mid-90s really when I started my career looking at this sort of stuff.  Just like maybe you got a 3×5 card that your grandma wrote a recipe in the 1980s, today she might type it into an email.  The medium of communication may have changed, but its still folklore:  the informal sharing of common knowledge.

A graph of relationships between terms related to Vaccination from Howard’s research

Trevor: A lot of your work is oriented around the notion of the vernacular web. Could you define the term and give us an example or two of the notion from your work?

Rob: Sure!  The idea of a “vernacular web” comes from the famous anthropologist Clifford Geertz.  Long before the Internet, he described humans as caught in “webs of signification” that we make for ourselves. His point was that we can’t just make up our own words or our own culture; both what we know and what we can say based on what we know emerges out on a huge web of interaction that has gone on before any of us got here and will go on after any one of us is gone. In the ‘90s, when I started studying the Internet, I was reading Geertz—and it seemed so obvious that the Internet was a real manifestation of that web he was imagining; a web of data sent across networks of communication.

The thing I am specifically interested in, though, is the power of everyday and informal communication.  “Vernacular” just means non-institutional; or outside of any formal or official channels: it’s the everyday stuff we say and do. That is where, in my mind, the real action of the human experience goes on:  in our day-to-day dealing with reality.  So the idea of the “vernacular web” is just that:  the vast interconnected web of everyday communication we enact together to create our shared sense of the world.

In the online environment, this web is really just more obvious:  we always have had vernacular webs, but online they are more concrete because they are directly observable through actual shared software-code. In a link from one website to another or in tweet about a YouTube video that links to blog, the vast zone of informal discourse online creates “vernacular webs” of interconnected digital media.  In these zones, people follow their friends’ posts and links and they pick up digital content, edit it, and then re-share it.  They can even take institutional content like a video clip from a TV show or movie and combine it with their own digital video and audio to create whole new hybrid media creations and share them with others in their vernacular web.

While there is, in a sense, one huge vernacular web of all the informal communication going on all the time—its really more useful to think of the different vernacular webs we each engage.  For example, I like to learn songs by AC/DC to play on the guitar.  I am not very good at that, but I can follow links from YouTube video comments to blogs to forums debating the correct way to play pretty obscure rock songs—and, in theory at least, get better at it!  That is one of my vernacular webs.  I enact another one when I go to a blog to read pros and cons about what vaccines to give my child; and then follow it to a forum full of parents discussing child-rearing tactics.  I am actually at least one connection between these two webs (AC/DC and child-rearing); but I tend to engage them at different times and for different reasons.

So really each of us is enacting our own vernacular web all the time.  I just focus my study the role the online sections of it are playing in our society today.

Trevor: I would also be interested in what ways of thinking about the web you see this as being compatible or incompatible with other approaches to theorizing the web. For example, I’ve seen you reference Henry Jenkins’s work on participatory culture, or Dan Perkel’s ethnographic research on DeviantArt. How your approach to studying the web as vernacular similar or different from approaches to studying participatory culture, and for that matter, any number of other ways of thinking about online interaction?

Rob:  You are right; there are a lot of ways of thinking about this stuff.  And “participatory culture” is one of them. But like Dan Perkel has said too, I think, its not like the Internet fundamentally changed us.  Its not like we all sat in silence and stared blankly at our TVs waiting for the Internet to show up. We have probably always had vernacular webs of communication.

At different moments in history, though, those webs have occupied different places in our society.  Before there were printed books, not too many people could read.  If you wanted to hear a story, you had to have some real person tell you one; and lots of people could do it.  Many of us still do that sometimes; to our kids or in informal conversations.  All that is pretty participatory.  The same goes for music:  people gather around a guitar or piano and even sing together; they might even hand the guitar around and trade tunes. With printed books and even more with commercially recorded media starting after World War II, an odd period in human history started in which we didn’t tell each other stories and make our own music quite as much; we paid media corporations to do a lot of that for us.  Now that people have a lot more control over the media they can access and share, things are starting to return to normal; what Henry Jenkins calls “participatory.”

But—in the big scope of things—our commercial media age has been a pretty short period in human history; it was maybe 50 years.  And besides we never really stopped telling each other stories; and now we are getting back to a more participatory culture because we have all these great tools that make it easy to create, modify, and share the kind of video and music that used to take a whole professional crew to produce.  Laurence Lessig’s idea of “remix” culture is, actually, I think a bit more interesting than “participatory culture”:  the idea that media participation in networks is a lot about taking some existing media object and remaking it with our own spin: you see tons of that on YouTube for example. Of course, that is a lot what folklorists have studied for over 200 years in terms of oral narrative and folk music:  individual performances of traditional stories or songs, like telling your own version of Cinderella to a child.  Today, we tend to play our own version of a Beatles song and put it up on YouTube, but it’s the same thing:  participation by remix! So we are coming back to a place we have been in mostly all along; it just seems strange to us because we all grew up at an odd moment in human history.  So we are less used to participating with our media-objects; for the next generation it will be back to a newly networked sort of digital-normal I think.

Jenkin’s idea is of course very well known—he is a PhD from my department at Wisconsin, Communication Arts actually—and it’s a fine idea; but it emphasizes a pretty basic way of thinking about media.  Perkel and others’ ethnographic approaches are more sophisticated and jibe better with my idea about the vernacular web because they understand all culture is participatory first, and its not really surprising that after a bit of distraction with TV and such, we are getting back to doing what humans do:  interact with each other.  Its just that now we are doing it with the technology of our day:  network communication.

Trevor: At this point you have been studying and publishing work on online communication since the mid 90s. I would be curious to hear a bit about how your approach and perspective have developed over time.

Rob:  Well, I have to admit I have followed a pretty classic trajectory on that! I started out in the early ‘90s being young and naive. I was thinking that the Internet world would vastly improve our world because it would give us all direct access to the information we need to make good choices.  Then—as I got older and crankier—and I saw that some people choose not to access good information; and some don’t necessarily make the best choices.

My first article back in 1997 was based on work I did in 1994.  In that piece I argue that the early Internet was making fundamentalist Christians online more tolerant that that sort of believer typically was.  But then in my 2011 book, Digital Jesus, I finished out my work on Christian fundamentalists online basically saying the opposite:  the control individuals have over their media consumption can lead them into digital enclaves that actually reduce the need to be tolerant. Basically, for some people, you can really limit your media intake to people and places that you already agree with.  In the extreme case I was looking at, things that seem absurd to most people don’t seem so absurd when a group of people have found each other from across the globe and formed an online community based on common beliefs that are pretty rare.

In my new work on parenting forums, its kind of the same sort of thing though, I am looking at how people’s access to each other has allowed them to elevate what I call “vernacular authority” over that of their doctors.  So I am interested in when people trust untrained people they find online instead of experts when making choices about their children’s healthcare.  That is a far trickery topic than any of my previous work because—frankly—the medical industry has not always behaved super well; but, on the other hand, the vast majority of treatments you will get from a North American doctor are going to be incredibly well studied and most-likely make you healthier faster that anything else you could do.

So—there are good and bad things; I used to think the freer the information the better for us all.  Now, I think its more fair to give people more information, but it also puts a lot of responsibly on people.  There is far too much information right here at our fingertips than any one person could hope to process in a lifetime. So people just won’t do it; or they won’t have the time to figure out how to do it. In the end, we all have to help each other out when trying to use the information that is out there because there is just so much. That requires more tolerance and paying closer attention to each other, but (for lots of reasons) people are not always able to do that.  Its funny; but I used to think people wouldn’t need to trust each other in a world where we can all get the data for ourselves.  Turns out, there is so much data out there now, that we need to help each other just to work through the different chunks of it we don’t have time to figure out!  We can’t all be experts in everything; humans have accumulated too much knowledge for that.

Trevor: Right now you have the benefit of directly looking for source material for your work on the open web. Given your perspective, what kinds of online content do you think is the most critical for cultural heritage organizations to preserve for folklorists of the future to study this moment in history?

Rob:  Wow.  That is a great question; and a really hard one for me.  Its funny, but back in the ‘90s most people weren’t really thinking “How can we archive this Internet stuff?” It seemed like it would always be there; but its clear now that as much as stuff stays online for a long time, it also so rapidly changes that our Internet of today looks completely different from the Internet of 1999.  I am really happy I saved hundreds and hundreds of full websites I was doing work on way back then.  I still get editors complaining that a website I have cited in an article or something no longer works.  Well—yes!  It’s the Internet!  It changes!  That’s why its cool! So there are certainly lots of great digital art pieces out there; and that should be saved—and it probably will be.  But what of the everyday art? Like so much folklore of the past, its not so much what we save—but how richly we save it, I think.  While its great to have hundreds of photoshops, to have a collection of all the top memes, to collect chain emails, archive that classic ASCII art, but what makes archives from the past most valuable, is the fully contextualized examples we have:  not just everybody’s tweets (Though that is a fantastic thing!), but groups of people tweeting together, their biographies, their feelings about each other, the things they do other than tweet—those contextual details are what make particular archives stand out; and those are the things that will be hardest to recover.  We will have lots of examples of video mashups from YouTube in 2013, but how many will we have with fully contextualized comments, interviews with participants, and documentation of which Facebook profiles posted which videos on their walls, and so on?  That richness is what I think is hardest and most valuable.

Trevor: If librarians, archivists and curators wanted to learn more about approaches like yours what examples of other scholars work would you suggest? It would be great if you could mention a few other scholars work and explain what you think is particularly interesting about their approaches.

Rob: Well, there so many good people out there—lots of up-and-coming researchers in folklore and beyond.  Its tough to really choose.  A lot of people are doing really good on the political side of things, an established scholar like Zizi Papacharissi does great stuff on the digital public sphere, her book A Private Sphere is great.  Mary Gray is another communication scholar doing top-notch ethnographic would among LGBT kids using networks, Out in the Country is her most recent book.  In folklore studies, you have a core group of scholars emerging as the group that really focus on the Internet.  Trevor Blank is kind of leading that group with his series of anthologies on the topic; and his book on Internet jokes should be coming out soon.  That is one to look for.  Anthony Buccitelli is another newer scholar.  He has a great piece that you will see coming out in Western Folklore very soon on Internet stuff.  Of course Russell Frank’s Newslore is proving to be pretty important.  And, though its about belief more broadly, Andre Kitta’s new book Vaccinations and Public Concern in History has a really great ethnographic engagement of vaccine rumors as they spread online.  She is another scholar to really watch.

A lot of researchers just finishing up their PhDs are really doing some of the best work right now, I think.  Its taken a while, but studying the Internet is really just getting up to speed, and now a new crop of researchers is really going to do some innovative stuff.  Two PhD candidates in my department, for example, are really doing great work.  Ashley Hinck is studying how online fan communities are able to encourage kids to get engaged in there local real-world communities, and Andrew Peck is documenting emerging forms of digital folklore of all sorts; my favorite being the “Slenderman”:  a sort of digitally altered photo that he has captured from its first inception as a hoax to what is a fully evolved legend matrix complete with competing backstories and scores of amateur “photohops” or digitally altered photos of the monster.  Those two young scholars are both people to look for in the future. I have a co-edited book coming out in May as well; we have some really great pieces in there on Internet stuff:  Tradition in the 21st Century.  Trevor Blank and I are co-editing that one, and I am really happy with it:  it really goes after this idea you started with, that folklore isn’t old  . . . so, of course, its online!  But in that book we really get some super smart scholars to think pretty critically about what “tradition” really means in the digital age.

Categories: Planet DigiPres

Some research data management terminology

Unsustainable Ideas - 22 February 2013 - 10:27am

Terminology in this area is confusing, and is used differently in different projects. For the purposes of a report I’m writing, unless otherwise specified, we will use terminology in the following way:

  • Data management is the handing and care of data (in our case research data) throughout its lifecycle. Data management thus will potentially involve several different actors.
  • Data management plans refer to formal or informal documents describing the processes and technologies to be deployed in data management, usually for a research project.
  • Data deposit refers to placing the data in a safe location, normally distinct from the environment of first use, where it has greater chance of persisting, and can be accessed for re-use (sometimes under conditions). Often referred to as data archiving.
  • Data re-use refers to use made of existing data either by its creators, or by others. If re-use is by the data creators, the implication is that the purpose or context has changed.
  • Data sharing is the process of making data available for re-use by others, either by data deposit, or on a peer to peer basis.
  • Data sharing plans refer to the processes and technologies to be used by the project to support data sharing.

Some JISCMRD projects made a finer distinction between data re-use and data re-purposing. I couldn’t quite get that. So I’m balancing on the edge of an upturned Occam’s Razor and choosing the simpler option!

Does this make sense? Comments welcomed!


Categories: Planet DigiPres

How to plan your research data management (planning is not writing the plan!)

Unsustainable Ideas - 21 February 2013 - 5:11pm

David duChemin, a Humanitarian Photographer from Vancouver, wrote a bog postduC13 at the start of 2013 (in the “New Year Resolution” season) entitled “Planning is just guessing. But with more pie charts and stuff”. He writes:

“Planning is good. Don’t get me wrong. It serves us well when we need a starting point and a string of what ifs.  I’m great at planning. Notebooks full of lists and drawings and little check-boxes, and the only thing worse than planning too much is not planning at all. It’s foolish not to do your due-diligence and think things through. Here’s the point it’s taken me 4 paragraphs to get to: you can only plan for what you’ll do, not for what life will do to you.”

OK he doesn’t really think planning is just guessing; in the post he’s stressing the need for flexibility, but also pointing out that planning (however flawed) is better than not planning.

That blog post is part of what inspired me to write this. Another part is a piecce of work that I’m doing that seems to have gone on forever. It seems like a good idea to put this up and see what comments I get that might be helpful.

Planning to manage the data for your research project is not the same thing as filling in a Checklist, or running DMP Online. The planning is about the thinking processes, not about answering the questions. The short summary of what follows is that planning your research data management is really an integral part of planning your research project.

So when planning your research data management, what must you do?

First, find out what data relevant to your planned research exists. You traditionally have to do a literature search; just make sure you do a data search as well. You need to ensure you’re aware of all relevant data resources that you and your colleagues have locally, and data resources that exist elsewhere. Some of these will be tangentially referenced in the literature you’ve reviewed. So the next step is to work out how you can get access to this data and use it if appropriate. It doesn’t have to be open; you can write to authors and data creators requesting permission (offering a citation in return). Several key journals have policies requiring data to be made available, if you need to back up your request.

The next step, clearly, is to determine what data you need to create: what experiments to run, what models, what interviews, what sources to transcribe. This is the exciting bit, the research you want to do. But it should be informed by what exists.

Now before planning how you are actually going to manage this data, you need to understand the policies and rules under which you must operate, and (perhaps even more important) the services and support that is available to you. Hidden in the policies and rules will be requirements for your data management (data security, privacy, backup, continued availability, etc). Hidden in the services and support will be some that will be very useful to you, and will save you time and diverted resources (institutional backup services, institutional data repositories, etc). As suggested above, these services and support could come from your group, your institution, your discipline, your scientific society, or your invisible college of colleagues around the world.

So now you can plan to manage your data. You may need to address many issues:

  • Identification, provenance and version control: how to connect associated datasets with the experimental events and sources from which they derived, and the conditions and circumstances associated.
  • Storage: how and where to store the data, so that you and your colleagues (who may be in other institutions and/or other countries with different data protection regimes) can work on it conveniently but securely. Issues like data size, rate of data creation, rate of data update may all be relevant here. Data backup! Encryption for sensitive data taken off-site. Access control. Annotation. Documentation.
  • Processing: how will you analyse and process your data, and how will you store the results. Back to provenance and version control!
  • Sharing: How to make data available to others, and under what conditions. Where will you deposit it? With what associated information to make it usable? Depends on the data of course, and issues such as data sensitivity. May also depend on data size etc. Which data to share? Which data to report?

That’s not everything but it’s the core. When you’ve done the basic planning at this sort of level, you can get down to writing the Plan! At this point the specific requirements of research funder and institution will come into play, and tools like DCC DMP Online will be useful. They may even remind you of key issues you had forgotten or ignored, or local services you (still) didn’t know about.

At this point you don’t know whether your research will be funded, so there is a limit to the amount of effort you should put into this. NERC wants a very much simplified one-page outline data management plan; it may be more sensible to have a 2 or 3-page plan covering the stuff above, and condense down (or up) as required by your funder.

But you’re still only at the first stage of your research data management planning! If you are lucky enough to get your project funded, there will be a project initiation phase, when you gather the resources (budget, staff, equipment, space). Effectively you’re going to build the systems and establish the protocols that will deliver your research project. At this point you should refine your plan, and add detail to some elements you were able to leave rather vague before. Now you’re moving from good intentions to practical realities. And given that life does throw unexpected events at you (staff leaving, IT systems failing, new regulations coming in), you may need to do this re-planning more than once. Keep them all! They are Records that could be useful to you in the future. In a near-worst case, they could form part of your defence against accusations of research malpractice!

My point is, this isn’t so much good research data management planning, as good planning for your research.

duC13 duChemin, D. (2013). Planning Is Just Guessing. But With More Pie Charts and Stuff. Vancouver, BC. Retrieved from http://davidduchemin.com/2013/01/planning-and-guessing/


Categories: Planet DigiPres

Community Building is What it’s All About

The Signal: Digital Preservation - 21 February 2013 - 3:20pm

I was exceptionally honored to be asked to give the opening keynote for code4lib 2013, one of the key meetings for library technologists.  People may have thought that I would speak about, well, coding, or repository development, or online tools or even digital preservation.

Leslie Johnston giving opening keynote at code4lib 2013

Leslie Johnston giving opening keynote at code4lib 2013. Photo by James Staub, used with permission from Flickr

But I didn’t. I talked about community building.

The code4lib community is a remarkable one in many ways. It self-organized when its members realized that what they most needed was a way to communicate, sharing challenges and opportunities and technologies and solutions, and to work together to achieve great things.  Which they have.

It’s also remarkable in that it is exceptionally social, and friendly and caring. I could not have been prouder when the community identified a need for an inclusive code4lib anti-harassment policy and code of conduct for its various modes of interaction (email, IRC Chat, and in-person conferences). Which they created and released in the most transparent way possible, on GitHub.

Getting back to my keynote, I had a few key points that I made:

  • Building software requires a community of people who care, whether they are stakeholders, developers, or users.
  • Releasing software requires a similar community of people who care.
  • Sustaining software requires a community of people who really care … enough to contribute time to identifying requirements, writing code, testing, documenting, and evangelizing.
  • Successful software requires a community of users who actively participate in and interact with the community of software creators.

To build a successful community, there are some requirements:

  • Communication.
  • Inclusiveness.
  • Consideration.
  • Even more communication.
  • A sense of ownership.  (I’ve seen software projects and communities fail because they’re shared with the world but no one really takes ownership in its success.)

The signs of a successful community are:

  • Participation.
  • Enthusiasm.
  • A sense of pride. (I’m part of that! I made that happen! That succeeded in part because of me!)
  • Adoption. (Not just of software, but of the mores and activities of the community.)
  • Evangelism. (Check this out! Look at what has been accomplished–don’t you want to use this? And participate?)

The successes of the code4lib community remind me of the work that we’re just really starting with the National Digital Stewardship Alliance. The Library of Congress, as part of our commitment to digital preservation leadership, created the NDSA as a community platform. We bring museums, libraries, archives and other institutions together to collaborate on ideas and products that benefit the nation. Our aim is to work directly with members and get them involved in collective activities with potential for broad benefit. The code4lib community is a model for NDSA to shoot for in terms of organizing, identifying needs and getting work done. Together.

Because the real point that I wanted to make in my keynote, and emphasized near the end of my time at the podium was this:

None of us should.  Ever.  Work.  Alone.  Anymore.

Period.

Categories: Planet DigiPres

Personal Digital Archiving: How Well Do You Score?

The Signal: Digital Preservation - 20 February 2013 - 2:45pm

The following is a guest post by Tess Webre, an intern with NDIIPP at the Library of Congress

Old Technology 10 by crabchick, on Flickr

I love dumb magazines. Love ‘em. The more photoshopped the front cover, the more vapid the articles, the more stereotypical the content, the more I love them. In college, while I was spelling woman with a y and trying to bring down the patriarchy, I hid my fondness for their idiocy in a thick layer of hostility and a canned speech about gender binaries.

As an adult I have rediscovered their silliness and banality, while being thoroughly entertained. Most of all, I adore the quizzes with titles such as “Is your guy a zero or a hero?” or “What is your hair personality?”.  So I thought, why not make a quiz on digital preservation complete with over the top puns, graphics, and dated pop culture references? Why not indeed, I ask?

So I did, here it is below.  Enjoy!


QUIZ:  Does your personal digital preservation strategy need a makeover?

A. When was the last time you updated your storage media?

  1. Justin Bieber was still in diapers.
  2. The Artist formerly known as Prince was still known as Prince
  3. I updated it when I heard Gangnam Style this Summer.
  4. The first Mötley Crüe reunion tour.

B.  What kind of annotation do you have on your files?

  1. I never metadata I didn’t like
  2. My files are like minimalist art, bare
  3. Dublincore? More like DublinSNORE
  4. Sometimes, I remember to annotate with xml, but mostly I’m emailing my ex.

C.  Oh no! You dropped your phone! What about all of your text messages, contacts and other files?

  1. Gone forever.
  2. I might have some files backed up by my service provider, but I don’t know how many….
  3. I backed up my files a few times, but don’t know if I can still access all of the files
  4. I’ve been up to date on my backups and know exactly what I lost.

D.  You have just completed a project for work, but realize that you’ve saved it in a proprietary file format.

  1. Carpe diem, (Latin for YOLO). I don’t have time to convert it.
  2. I’ve made sure to convert it in an open format so that I can access it later.
  3. I know that I will have access to the software in the future. No need to worry.
  4. The proprietary format will lose a lot of its original format if I convert it, maybe it won’t be such a big deal.

E.  Finish the sentence: “My digital legacy …”

  1. Is secure, as I have created a clear plan as to what I want done with my digital files, accounts, etc. for my future heirs.
  2. Is far in the future, I’ve determined that it’s way too early to be worrying about it.
  3. Is completely uncoordinated. I don’t have anything worth passing on to the future.
  4. Is secure. I’ve given the gist of it to friends and family, but don’t need to write anything down.

F.  It’s 10 pm, do you know where all your files are?

  1. Yes, they are all on my hard drive, in my phone and on my digital camera.
  2. No, I don’t.
  3. Yes, there are files here and in another location.
  4. I don’t even know what I have.

G.  True or False: my current operating system will never be obsolete.

  1. True
  2. False

(Correct Answers: 3, 1, 4, 2, 1, 3, False.)

For each correct answer give yourself a point.

If you got 0-2 correct:

Your data plan is out of date. You need to give your personal digital preservation plans a makeover.  Try reading up on some new ways to update your digital preservation style here.

If you got 3-5 correct:

You’ve gotten a few right, but still should update your preservation style. Don’t turn your data into a problem. Try reading up on some new ways to update your digital preservation style here.

If you got 6-7 correct:

Congrats, you have the right stuff to preserve your data. Rock on! You know about the ins and outs of preserving your digital data. But make sure that you keep up to date on future updates.

Until next time, I wish you all safe data.

Categories: Planet DigiPres

Call for Proposals: The Scholarly Use of Web Archives

The Signal: Digital Preservation - 19 February 2013 - 9:03pm

Each year the International Internet Preservation Consortium holds a day-long public conference in conjunction with their General Assembly. This year’s theme is Scholarly Access to Web Archives: Progress, Requirements, and Challenges, the open meeting will be held Thursday, April 25, 2013 in Ljubljana, Slovenia.

Presentations are sought that will contribute to the discussion of the scholarly use of web archives, including but not exclusive to

  • the ways in which web archives are currently used by scholars and researchers,
  • how scholars would like to use web archives but cannot due to IP or technical barriers,
  • a discussion of web archives as scholarly resources, and
  • how to provide scholarly access to web archives.

Examples of specific collections that are created for specific scholarly uses, and examples of how broad or domain-wide collections are used in a scholarly context are encouraged.

Video: Social Scientists Using Web Archives to Study World News

Limited, half-day time-slots are also available for workshops or training for specific web archiving tools, concepts, or issues. Past workshops included a legal issues discussion, a hands-on Hadoop training, and a Crowdsourcing exercise. Workshops will be presented on Friday, April 26, 2013 in Ljubljana, Slovenia.

If you are interested in proposing a presentation or workshop see the detailed instructions at the IIPC website. Proposals are due  February 22, 2013. The program will be announced March 4, 2013.

2013 is the ten-year anniversary of the IIPC. The IIPC is a membership organization dedicated to improving the tools, standards, and best practices of web archiving while promoting international collaboration and the broad access and use of web archives for research and cultural heritage.

Categories: Planet DigiPres

The DOI has no clothes, and Publishers have taken them away!

Unsustainable Ideas - 19 February 2013 - 11:57am

So what’s the Digital Object Identifier for, really? I thought it was a permanent identifier so that we could link from one article to the articles it references in a pretty seamless fashion. OK, not totally seamlessly, since a DOI is not a URI, but all we have to do is stick http://dx.doi.org/ on the front of a DOI, and we’re there. So we should end up with an almost seamless worldwide web of knowledge (not Web of Knowledgetm, that’s someone’s proprietary business product).

Obviously the Publishers must play a large part in making this happen. They support the DOI system through their membership of Crossref, and supplying the metadata to make it work. And sometimes they remember that when they transfer a journal from one publisher or location to another, they can fix the resulting mess simply by changing the redirect inside the DOI system. (And sometimes they forget, but that’s another story.)

And of course, these big, toll-access, subscription-based Publishers trumpet all the Added Value that their publishing processes put onto the articles that we write and give to them (and referee for them, and persuade our libraries to buy for them, and…). So obviously that Added Value will extend to ensuring that all references have DOIs where available? A pretty simple thing to add in the copy-editing stage, I would have thought.

Except that they don’t. They display few if any DOIs in their reference lists of “their” articles. In fact my limited, non-scientific evidence-collecting suggests to me that they probably do the opposite to Adding Value: remove DOIs from manuscripts submitted to them. OK, I have no direct evidence of the removal claim, but I reckon there is pretty good circumstantial evidence.

I don’t have a substantial base of articles to work from (not being affiliated with a big library any more), but I’ve had a scan at the reference section of several recent articles from a selection of publishers. What do I see?

Take for example this editorial in Nature Materials:

Nature. (2013). Beware the impact factor. Nature materials, 12(2), 89. doi:10.1038/nmat3566

Yes, there’s a DOI in the reference I used. Mendeley picked that DOI up automatically from the paper. If I use that paper in a reference, the DOI will be included by Mendeley. This presumably  also happens with EndNote and other reference managers. (Here’s me inserting a citation for (Shotton, Portwin, Klyne, & Miles, 2009) from EndNote… yes, there it is, down the bottom with a big fat DOI in it.) (This is part of my circumstantial evidence for Value Reduction by Publishers! We give them DOIs, they take them away.)

Anyway, looking at that Nature editorial, there are no DOIs in the reference list. Reference 7 is:

7. Campanario, J. M. J. Am. Soc. Inf. Sci. Technol. 62, 230–235 (2011).

I tried copy/pasting that into Google. I get two results, neither of which appears to be a JASIST article. OK let’s try this one, in a completely different field, from an Elsevier journal:

McCabe, M. J., Snyder, C. M., & Fagin, A. (2013). Open Access versus Traditional Journal Pricing: Using a Simple “Platform Market” Model to Understand Which Will Win (and Which Should). The Journal of Academic Librarianship, 39(1), 11–19. doi:10.1016/j.acalib.2012.11.035

Again, none of the referenced articles have DOIs included in the reference list. Here’s a recent reference:

Jeon, D. -S.,&Rochet, J. -C. (2010). The pricing of academic journals: A two-sided market perspective. American Economic Journal: Microeconomics, 2, 222–255.

Maybe that article (and all of the others) doesn’t have a DOI? Same trick with Google, we don’t get there straight away, we get to another search, for articles with the word “perspective” in that journal… which does get us to the right place. And yes, the article does have a DOI (10.1257/mic.2.2.222). Let’s try this article; surely Nucleic Acids Research is one of the good guys?

Fernández-Suárez, X. M., & Galperin, M. Y. (2013). The 2013 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection. Nucleic acids research, 41(D1), D1–7. doi:10.1093/nar/gks1297

No DOIs in the reference list. Here’s an odd one, from Nature again:

Piwowar, H. A. (2013). Value all research products. Nature, 493(7431), 159. doi:10.1038/493159a

Here they include no DOIs for actual articles, but there are URL-DOIs for Figshare! The first two references are:

1. Priem, J., Costello, K. & Dzuba, T. Figshare http://dx.doi.org/10.6084/m9.figshare.104629 (2012).

2. Fausto, S. et al. PLoS ONE 7, e50109 (2012).

Do the latest OA publishers do any better? Sadly, IJDC appears not to show DOIs in references. I couldn’t see any in references in the most recent PLoS one article I looked at (Grieneisen and Zhang, 2012). Nor Carroll (2011) in PLoS Biology. But yes, definitely some DOIs in references in Lister, Datta et al (2010) in PLoS Computational Biology.

What about the newest kid on the block? You know, the cheap publisher who’s going to lead to the downfall of the scholarly world as we know it? Yes! The wonderful article by Taylor and Wedel (2013) in PeerJ has references liberally bestowed with DOIs!

When I tweeted my outrage about this situation, someone suggested it’s just the publishers simply following the style guides. WTF?

Publishers! You want us to believe you are adding value to our srticles? Then use the Digital Object Identifier system. Keep the DOIs we give you, and add the DOIs we don’t!

PS At one stage in preparing for this post I tried copying reference lists from PDFs and pasting them into Word. You should try it some time. It’s an absolute disaster, in many cases! Which is NOT the fault of PDF, it is the fault of the system used to create the PDF… ie the Publisher’s system. Added Value again?

PPS: here’s that reference inserted by EndNote:

Shotton, D., Portwin, K., Klyne, G., & Miles, A. (2009). Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article. PLoS Comput Biol, 5(4), e1000361. http://dx.doi.org/10.1371%2Fjournal.pcbi.1000361

EDIT: As the comments below suggest, my post is generally true insofar as PDF versions of articles are concerned, although even there some publishers (eg BioMedCentral) do incorporate a hidden clickable link behind the reference (in BMC’s case to PubMed rather than the DOI). Several publishers have MUCH better behaviours in their HTML versions, with both explicitly visible DOIs and clickable versions of references). Sadly, HTML has no agreed container format, and is next to useless for storing articles for later reference, so it is most likely that the articles you store and use on your computer will be the sort of stunted PDFs I describe here. I still claim: this is not good enough.


Categories: Planet DigiPres

Boskone and dataliths

Files That Last - 18 February 2013 - 3:07pm

On Saturday I was on a panel at Boskone on “Dataliths: Digging the idea of the programmer/archaeologist,” along with Vernor Vinge, Charles Stross, Dana Cameron, and Janice Gelb. As you’d expect at a science fiction convention, the discussion got very speculative at times, with ideas like “life recorders” that keep a complete record of each person’s activities and “data diamond” that stores 1′s as carbon-13 atoms and 0′s as carbon-12 atoms. Vinge brought up some interesting ideas about how to provide bootstrapping information for an archive that might be read after the collapse of civilization, and Stross showed he’s quite an expert on computer issues. This was my first opportunity to show off the cover of Files that Last and give out promotional cards at a convention.

The high point for me came later on, when Vinge thanked me for inventing the term “datalith.”

The advance version for people who pledged at the $25 level will be available this week. Those of you who’ve seen the pre-advance version, I’d appreciate it if you could send me any comments on it as soon as you can.


Categories: Planet DigiPres

On Building a Debian Package of a Ruby Program

Open Planets Foundation Blogs - 18 February 2013 - 2:25pm

Building a Debian Package from a program written in Ruby is not a straightforward task. This post intends to be a step by step practical guide on packaging ruby programs based on the lessons we learned during the debianization process.

We will use in this guide a sample program: Pagelyzer (http://wiki.opf-labs.org/display/TR/Pagelyzer). This program is an interesting example because of its complexity, it contains Ruby code, java, javascript, as well as some binary libraries in C.

Packaging Ruby scripts is not that different as packaging other software, but using different rules. A debian packaging software relies on standard linux development tools, such as make. This step is crucial to construct a deb package.

As Ruby is normally interpreted (can be compiled but it is not usual) the make command will not work, because there is not MakeFile. Therefore, ruby community has put in place an option for going to the whole process. The proutils ruby project gives all the requirements needed to create a deb package. Its goal is to work in the same way as the make command. Thus, the packaging software won't complain in the process.

This tutorial is a summary based on the previous work of Ubuntu developer David Green tutorial posted on Sep 2012 (https://wiki.ubuntu.com/PackagingGuideDeprecated/Ruby).

Requirements

In this section we will describe the software and file structures needed to make the package.

Setting up the enviroment

Here's what we need to begin packaging our software:

  • ruby
  • wget
  • dh-make
  • build-essential
  • fakeroot
  • cdbs
  • debhelper
  • ruby-pkg-tools

The corresponding apt-get command:

$ sudo apt-get install ruby1.9.1-full wget dh-make build-essential fakeroot cdbs debhelper ruby-pkg-tools

Some of the tools introduced below will look for two environment variables to guess your name and email address to put in the package metadata, let's set them up here:

$ export DEBFULLNAME="Your Name"
$ export DEBEMAIL="Your.Email@address.here"

you should also add these to your .bashrc or other shell startup script if you want them to be set up automatically.

Creating the Source Archive

To create the source archive we need to:

  • Create a new directory for the project.
  • Download setup.rb
  • Create the directory structure.
  • Add the files.
  • Test that it works.
  • Create the tarball.
Create a Directory

Create a directory in the following format package-name-version. We will use pagelyzer-ruby-0.9

Also, change into the new directory.

$ mkdir pagelyzer-ruby-0.9
$ cd  pagelyzer-ruby-0.9

Getting setup.rb

We need to download the setup.rb file from:

$ wget http://i.loveruby.net/archive/setup/setup-3.4.1.tar.gz

We only need the setup.rb file, we can delete the rest of the files in the folder.

Or download it from the attachment (bottom of the page): http://www.openplanetsfoundation.org/system/files/setup-3.4.1.zip

Create the Directory Structure

The directory structure used by setup.rb is as followed:

PackageTop/
    lib/
        (ruby scripts)
    ext/
        (ruby extensions)
    bin/
        (commands)
    data/
        (data files)
    etc/
        (configuration files)
    man/
        (manual pages)
    test/
        (tests)

(taken from the setup.rb manual)

Create these directories:

$ mkdir lib ext bin data etc man test

Create other directories that will be used:

$ mkdir man/man1 data/pagelyzer-ruby data/pagelyzer-ruby/js data/doc data/doc/pagelyzer-ruby

Add the Files

Here we list the correspondence of scripts into the directory structure:

FileFolderpagelyzer_analyzerbinpagelyzer_capturebinpagelyzer_changedetectionbinpagelyzer_block.rblibpagelyzer_convex_hull.rblibpagelyzer_dimension.rblibpagelyzer_driver.rblibpagelyzer_heuristic.rblibpagelyzer_point.rblibpagelyzer_separator.rblibpagelyzer_url_utils.rblibpagelyzer_util.rblibjs/compress_js.rbdata/pagelyzer-ruby/jsjs/decorate.jsdata/pagelyzer-ruby/jsjs/decorate_mini.jsdata/pagelyzer-ruby/jsmarcalizer.zipdata/pagelyzer-rubypagelyzer_diff.jardata/pagelyzer-ruby

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Note: All .rb files in bin and lib folder should be executable. In contrary case, setup.rb will not include them.

We need to create a manpage for each executable file in /usr/bin. To do this edit man/man1/pagelyzer_changedetection.1. Here a small example, but it should be more extensive.

.TH pagelyzer_changedetection 1 "JAN 20 2013" "Andrés Sanoja"
.SH NAME
pagelyzer_changedetection \- a tool for detecting changes in web pages and their rendering
.SH SYNOPSIS
.B pagelyzer_changedetection
.BR [string]
.PP
.SH DESCRIPTION
Covers the change detection process: capture, segmentation, version analysis (visual and structural)
.PP
.SH AUTHOR
.TP
Andrés SANOJA <andres.sanoja@lip6.fr>
Myriam Ben Saad <myriam.ben-saad@lip6.fr>
Marc Law <marc.law@lip6.fr>
Carlos Sureda <carlos.sureda@lip6.fr>
Jordi Creus <Jordi.Creus@lip6.fr>

Note: manpages are written in the nroff format. You can also use other formats such as ri or pod and convert them to nroff.

Test That it Works

Install pagelyzer-ruby on your system using setup.rb directly:

$ ruby setup.rb config
$ sudo ruby setup.rb install

Next run:

$ capture.rb –url=http://www.lip6.fr

which should output the web page screenshot, decorated file and source code in the ~/pagelyzer/out folder

Also, test that the manpage works:

$ man pagelyzer_analyzer

To uninstall run:

$ sudo rm -rfi `cat InstalledFiles`

Delete the '.config' file:

$ rm .config

Create the Tarball

Create a gzipped tar archive of the working folder:

$ cd ..
$ tar cavf pagelyzer-ruby-0.9.tar.gz pagelyzer-ruby-0.9

This should create your source archive, pagelyzer-ruby-0.9.tar.gz.

The Packaging Process

To create a package we need to:

  • Set up the extra files needed for packaging (in the debian/ directory).
  • Remove the unnecessary files that were created.
  • Edit the debian/rules file.
  • Edit the debian/control file.
  • Edit the debian/postinst file.
  • Edit the other meta data files.
  • Build the packages.
Setup the Extra Files

We are going to use dh_make, which will create a template from which we will work on. Run:

$ dh_make -c lgpl -s -r cdbs -f ../pagelyzer-ruby-0.9.tar.gz

which means: -c lgpl tells it that the package is licensed under the LGPL license, -s tells it that we just want one binary package, -r tells it to use CDBS, Common Debian Build System, which will make our packaging simple, so we can concentrate on the Ruby specific things. -f ../pagelyzer-ruby-0.9.tar.gz tells it that we are using the ../pagelyzer-ruby-0.9.tar.gz file as our source.

You should see something like:

Maintainer name : Your Name
Email-Address   : Your.Email@address.here
Date            : Wed, 24 Jan 2013 19:53:51 +0530
Package Name    : pagelyzer-ruby
Version         : 0.9
License         : lgpl3
Using dpatch    : no
Using quilt     : no
Type of Package : cdbs
Hit <enter> to confirm:
Currently there is no top level Makefile.  This may require additional tuning.

Please edit the files in the debian/ subdirectory now.  Before we look at what has happened inside the pagelyzer-ruby-0.9/ directory, let's see what has happened to the directory above it:

$ ls ..

You'll notice that there is a file here that we haven't created: pagelyzer-ruby-1.0.orig.tar.gz. Packaging programs, in addition to binary package, also generates a source package which consists of three files: ${PKGNAME}_${VER}.orig.tar.gz (the original upstream tarball), ${PKGNAME}_${VER}-${PKGVER}.diff.gz (a diff file for the debian/ directory) and ${PKGNAME}_${VER}-${PKGVER}.dsc (a signed summary of the source package). Because we told dh_make where our upstream source tarball was, it renamed it appropriately (${PKGNAME}_${VER}.orig.tar.gz). We could very well have renamed it ourself and not passed the -f option, we chose to be lazy!

Rename debian/postinst.ex file

postinst.ex is a template we need later, rename it to postinst (without extension):

$ mv debian/postinst.ex debian/postinst

Remove Unnecessary Files

Some of the files created are examples and not required. We can delete those with this command:

$ rm debian/*.ex debian/*.EX debian/READ*

Edit debian/rules

Set the contents of debian/rules to this:

#!/usr/bin/make -f
# -*- mode: makefile; coding: utf-8 -*-

include /usr/share/cdbs/1/rules/debhelper.mk
include /usr/share/ruby-pkg-tools/1/class/ruby-setup-rb.mk

This tells the packager to use setup.rb to create the package.

Edit debian/control

Edit the contents of debian/control to something like this:

Source: pagelyzer-ruby
Section: misc
Priority: extra
Maintainer: Andrés Sanoja <andres.sanoja@lip6.fr>
Build-Depends: cdbs, debhelper (>= 8.0.0), ruby-pkg-tools
# ruby1.9.1-full, libxslt-dev, libxml2-dev, openjdk-7-jdk, imagemagick, ruby1.9.1-dev
Standards-Version: 3.9.2
Homepage: http://wiki.opf-labs.org/display/TR/Pagelyzer
#Vcs-Git: git://git.debian.org/collab-maint/pagelyzer.git
#Vcs-Browser: http://git.debian.org/?p=collab-maint/pagelyzer.git;a=summary

Package: pagelyzer-ruby1.9.1
Architecture: amd64
Depends: ruby1.9.1, cdbs, debhelper (>= 8.0.0), ruby-pkg-tools, libxslt-dev, libxml2-dev, openjdk-6-jdk, imagemagick, ruby1.9.1-dev, ${shlibs:Depends}, ${misc:Depends}
# ruby1.9.1-full
# openjdk-7-jdk
Description: Suite of tools for detecting changes and its rendering
 Tool for the web pages comparison based on structural and visual approach.
 Research challenge for this tool is the learning algorithm based on frequency.
 .
 Pagelyzer is a tool which compares two web pages versions and decides if they
 are similar or not.
 .
 It is based on:
  * a combination of structural and visual comparison methods embedded in a
 statistical discriminative model,
  * a visual similarity measure designed for Web pages that improves change
 detection,
  * a supervised feature selection method adapted to Web archiving.
 .
 We train a Support Vector Machine model with vectors of similarity scores
 between successive versions of pages. The trained model then determines whether
 two versions, defined by their vector of similarity scores, are similar or not.
 Experiments on real Web archives validate our approach.

Package: pagelyzer-ruby
Architecture: amd64
Depends: pagelyzer-ruby1.9.1, ${misc:Depends}
# , ruby1.9.1-full, cdbs, debhelper (>= 8.0.0), ruby-pkg-tools, libxslt-dev, libxml2-dev, openjdk-6-jdk, imagemagick,ruby1.9.1-dev, ${shlibs:Depends}
# openjdk-7-jdk
Description: Suite of tools for detecting changes and its rendering
 metapackage
 Suite of tools for detecting changes and its rendering.
 Dummy package for pagelyzer-ruby1.9.1

Note that we need to split the packages into a ruby version dependent (dependent on ruby1.9.1) and a dummy package that depends on the version dependent package. If we don't do this, the packaging process will seem to work OK but the packages will not contain any of the files we created will not be in the resulting .deb files! (Remark made by SevenMachines on the Ubuntu Forums thread).

Edit debian/postinst actions

Some ruby gems should be present for the software works properly. In the debian/postinst file (remove .ex extension) add the following:

...
# dh_installdeb will replace this with shell code automatically
# generated by other debhelper scripts.

sudo ln -sf /usr/bin/ruby1.9.1 /usr/bin/ruby
sudo ln -sf /usr/bin/gem1.9.1 /usr/bin/gem


sudo gem install --version '= 0.8.6' hpricot
sudo gem install --version '= 1.5.5' nokogiri
sudo gem install --version '= 2.0.3' sanitize
sudo gem install --version '= 2.29.0' selenium-webdriver


#DEBHELPER#
...

Edit the Other Files

Edit debian/changelog and debian/copyright. Make sure you edit these correctly - especially the debian/copyright file.

Scape project is based on git version control. All changelog information is in there. So, the best way  is to download a script from (https://github.com/rackerhacker/gitlog-to-deblog) get into a git hub working folder and generate the changelog file.

It is important to take advice in the version numbers and package name. It should be the same. In our case it is 0.9 but git can change it a bit. For example,

pagelyzer (initial-11-gbbcc12f) unstable; urgency=low

  * Including performance test and enhacements in change_detection.rb

should be change to something like this:

pagelyzer-ruby (0.9-11-gbbcc12f) unstable; urgency=low

  * Including performance test and enhacements in change_detection.rb
 

And an example of 'copyright' file:

Format: http://dep.debian.net/deps/dep5
Upstream-Name: pagelyzer-ruby
Source: https://github.com/openplanets/pagelyzer

Files: *
Copyright: 2011, 2012 Andrés Sanoja <afsanoja@gmail.com>
           2011, 2012 Stéphane Gançarski <Stephane.Gancarski@lip6.fr>
           2011, 2012 Zeynep Pehlivan <zeynep.pehlivan@gmail.com>
           2011, 2012 Denis Pitzalis <denis.pitzalis@gmail.com>
           2011, 2012 Marc Law <marc.law@lip6.fr>
License: LGPL-3.0+

Files: debian/*
Copyright: 2013 Jordi Creus Tomàs <Jordi.Creus@lip6.fr>
License: LGPL-3.0+

License: LGPL-3.0+
 This package is free software; you can redistribute it and/or
 modify it under the terms of the GNU Lesser General Public
 License as published by the Free Software Foundation; either
 version 3 of the License, or (at your option) any later version.
 .
 This package is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 Lesser General Public License for more details.
 .
 You should have received a copy of the GNU General Public License
 along with this program. If not, see <http://www.gnu.org/licenses/>.
 .
 On Debian systems, the complete text of the GNU Lesser General
 Public License can be found in "/usr/share/common-licenses/LGPL-3".
 

Build the Packages

Make sure you are in the project root directory and use the 'debuild' command to create the packages:

$ debuild -us -uc

That will build the source and binary package. The -us -uc options are not to sign the source and changes files (we would need to sign them to upload them to Ubuntu/Debian/your PPA, but we'll skip that in this tutorial, if you are interested in PPAs, there was a session about it in https://wiki.ubuntu.com/MeetingLogs/openweekhardy/LaunchpadPPAs

It should create separate packages for pagelyzer-ruby1.9.1_0.9-rr-cccc_i386.deb and pagelyzer-ruby_0.9-rr-cccc_i386.deb. Where rr is the last revision number (from changelog file) and cccc the hash of the revision (also from changelog file)

For some strange reason this command is not enough, we should use also:

$ dpkg-buildpackage

Note: be careful if you use a virtual machine (e.g., virtualbox, etc), your files should be in a folder where you have “real” write permissions (e.g., a folder inside your virtual disk). Otherwise, if you use a shared folder for your files you will come across a 'Read-only file system' error.

Signing your package

You need first a GPG key. Follow the steps in http://keyring.debian.org/creating-key.html. If you have never run gpg before, do it:

$ gpg

This will create the ~/.gnupg directory, then you will be able to modify the ~/.gnupg/gpg.conf file according to the tutorial. Finally, do not forget to make your new key publicly available to the pgp server (it will be automatically distributed on the pgp network in a few minutes):

$ gpg --keyserver subkeys.pgp.net --send-key 12345678

Now, you can finally sign your package by running the command:

$ debuild -k12345678

Copying GPG keys across different machines

If you are building a package for different architectures, i386, amd64... You must sign all them with the same key. To copy a GPG key from one machine to another, you simply need to copy all *.gpg files (pubring.gpg, secring.gpg and trustdb.gpg) and gpg.conf on ~/.gnupg directory from one machine to the other one. (random_seed file is not mandatory)

Uploading your package to the repository

Install and configure dupload according to http://wiki.opf-labs.org/display/SP/Submitting+Your+Package tutorial. Do not forget to enable FTP_PASSIVE mode!

$ export FTP_PASSIVE=1

Finally, you can upload your files:

$ dupload pagelyzer-ruby_0.9-12-gbbcc12f_amd64.changes
...
$ dupload pagelyzer-ruby_0.9-12-gbbcc12f_i386.changes
...

Installing the software

Double click in the .deb file

pagelyzer-ruby1.9.1_0.9-xx-yyyyyyy_i386.deb

This will install everything you need as well as the tool itself.

 

Authors: Andrés Sanoja & Jordi Creus

 

AttachmentSize setup-3.4.1.zip57.85 KB
Categories: Planet DigiPres

National Digital Stewardship Residency Call For Applications Now Open

The Signal: Digital Preservation - 15 February 2013 - 6:09pm

The following is a guest post by Ali Fazal, Program Management Assistant for NDSR at the Library of Congress

The Library of Congress and the Institute of Museum and Library Services are pleased to announce the official open call for applications for the first National Digital Stewardship Residency.  From today until April 5, 2013, applicants can log onto the application portal here and apply to be a part of NDSR’s inaugural class.

The NDSR program will allow 10 recent master’s program graduates in relevant fields to complete a nine-month residency at various institutions in the Washington, D.C. area.  Beginning in September 2013, accepted residents will attend an intensive two-week digital stewardship workshop at the Library of Congress. Thereafter, residents will move to a host institution to work on significant digital stewardship projects. These projects will allow them to acquire hands-on knowledge and skills involving the collection, selection, management, long-term preservation, and accessibility of digital assets.

The Smithsonian Institution Archives , the World Bank , the Library of Congress, and the Public Broadcasting Service are just a few of the renowned institutions that will host the founding NDSR class. Each of these institutions submitted challenging project proposals that can be viewed on the NDSR website here. Application requirements include a detailed resume and cover letter, undergraduate and graduate transcripts, three letters of recommendation, and a creative video that defines an applicant’s interest in the program.  (See this page for more application information.)

“The video requirement in the application process will help the selection panel identify which candidates are truly passionate about digital preservation” explained George Coulbourne, Library of Congress OSI executive program officer.

For additional information and updates regarding the National Digital Stewardship Residency, please see our website.

 

Categories: Planet DigiPres

Preserving Born-Digital Community and Hyperlocal News

The Signal: Digital Preservation - 15 February 2013 - 4:06pm

Guest post by Abbie Grotke, Library of Congress Web Archiving Team Lead, and Co-Chair of the National Digital Stewardship Alliance Content Working Group. 

You may have read the news last week that the community news website EveryBlock shut its doors rather abruptly. Founded in 2007 with help from a Knight Foundation News Challenge grant, EveryBlock’s goal was to “create, test and release open-source software that links databases to allow citizens of a large city to learn (and act on) civic information about their neighborhood or block.”Everyblock.com

The shutdown surprised the founder of the site (who had recently left the company), as well as users, who had no warning and opportunity to download any of their content. An article on the Knight Digital Media Center proclaims this “how NOT to shut down a community site.”

I participate in the National Digital Stewardship Alliance Content Working Group, where we’ve broken up into Content Teams to focus on particular areas of born digital content in need of preserving. Members of the News, Media and Journalism Content Team (led by Kristine Hanna at the Internet Archive, and including Cathy Hartman from the University of North Texas Libraries, James Simon from the Center for Research Libraries, John Weise from the University of Michigan, Gail McMillan of Virginia Tech, and myself and Amber Paranick from LOC) have been developing case studies to describe some of the challenges and potential opportunities with preserving at-risk news content. With the news of EveryBlock’s shutdown, we thought it a good time to roll out our Born-Digital Community and Hyperlocal News case study (PDF) to the public for comment. In addition, we’ve been working on case studies for newspaper e-prints (PDF) and citizen journalism (PDF) (blogs and social networking sites), and we invite you to take a look at these as well.

Our case studies (and those being worked on in some of the other teams), aim to:

  • Establish the value of the content and provide the rationale for selecting it for preservation. What value does the content have? Is anyone collecting it? What factors affect its risk of disappearance?
  • Document recognized opportunities for preserving this content. Are there workflows in the creation or distribution of content that present opportunities for preservation?
  • Describe target audiences/stakeholders. Who would find value in this content and how might they be engaged in the process of preservation?
  • Outline a plan for educating stakeholders. How might NDSA or another organization raise the awareness of stakeholders including content creators, publishers, educators, libraries, researchers, or donors?
  • Describe potential obstacles or risk factors. What barriers for users/creators/preservationists might be faced and what options are there for overcoming them?
  • Develop actionable next steps. What can we do next, as a community or as individual institutions to ensure that important content is preserved?

We welcome community feedback on our case studies as well as ideas on possible next steps to reach out to stakeholders of these types of content. The case studies are a first step at articulating some of the issues, ways the community might work together to preserve at-risk news content and most importantly – where do we go from here.

We hope that by raising awareness about the importance of preserving such content, both creators and publishers of sites such as EveryBlock would think about the value that such a resource might have for future scholars and historians interested in the creation of local news by communities, citizen journalism, and how these media outlets were distributed in the early 21st century. There is no print equivalent to EveryBlock. I was pleased to see that some versions of EveryBlock were archived at the Internet Archive, though this is not something that can be guaranteed all the time; and because of the way the site is constructed, navigating to the old posts can be tricky if not impossible for some of the content.

With born-digital news content, if we don’t preserve it now before it goes away, it can and will be lost forever. We can’t save everything; but with no warning of a shutdown, the archivists and the curators don’t even get a chance to make that choice.

Categories: Planet DigiPres

Reaching out from L-space, part 2

File Formats Blog - 15 February 2013 - 12:07pm

(This is a continuation of Reaching out from L-Space.)

Let’s look more specifically at digital preservation. This is something that should be of interest to everyone, since we all have files that we want to keep around for a long time, such as photographs. Even so, it doesn’t get wide notice as an area of study outside libraries and archives. All the existing books about it are expensive academic volumes for specialists.

Efforts are being made. The Library of Congress has digitalpreservation.gov, which has a lot of information for the ordinary user. There’s the Personal Digital Archiving Conference, which is coming up shortly.

At PDA 2012, Mike Ashenfelder said in the keynote speech:

Today in 2012, most of the world’s leading cultural institutions are engaged in digital preservation of some sort, and we’re doing quite well after a decade. We have any number of meetings throughout the year — the ECDL, the JCDL, iPres, this — but despite this decade of institutional progress, we’ve neglected the general public, and that’s everybody.

Why hasn’t there been more of an effect from these efforts? One reason may be that they’re pitched at the wrong level, either too high or too low. Technical resources often aren’t user-friendly and are useful only to specialists. The Library of Congress’s efforts are aimed largely at end users, and it’s sometimes very basic and repetitive. A big issue is picking the right level to talk to. We need to engage non-library techies and not just stay inside L-space.

Let’s narrow the focus again and look at JHOVE. It’s a software tool that was developed at Harvard; the design was Stephen Abrams’, and I wrote most of the code. It identifies file formats, validates files, and extracts metadata. Its validation is strictly by the specification. Its error messages are often mysterious, and it doesn’t generally take into account the reality of what kinds of files are accepted. Postel’s law says, “Be conservative in what you do; be liberal in what you accept from others”; but JHOVE doesn’t follow this. As a validation tool, it does need to be on the conservative side, but it may go a bit too far.

JHOVE is useful for preservation specialists, but not so much for the general user. I haven’t tried to change its purpose; it has its user base and they know what to accept of it. There should also be tools, though, for a more general user base.

JHOVE leads to the issue of open source in general. As library software developers, we should be using and creating open-source code. We need to get input from users on what we’re doing. Bram de Werf wrote on the Open Planets Foundation blog:

You will read in most digital preservation survey reports that these same tools are not meeting the needs of the community. At conferences, you will hear complaints about the performance of the tools. BUT, most strikingly, when visiting the sites where these tools are downloadable for free, you will see no signs of an active user community reporting bugs and submitting feature requests. The forums are silent. The open source code is sometimes absent and there are neither community building approaches nor procedures in place for committing code to the open source project.

Creating a community where communication happens is a challenge. Users are shy about making requests and reporting bugs. I don’t have a lot of good answers here. With JHOVE, I’ve had limited success. There was an active community for a while; users not only reported bugs but often submitted working code that I just had to test and incorporate into the release. Now there’s less of that, perhaps because JHOVE has been around for a long time. An open source community requires proactive engagement; you can’t just create a project and expect input. Large projects like Mozilla manage to get a community; for smaller niche projects it’s harder.

Actually, the term “project” is a mistake if you think of it as getting a grant, creating some software, and being done with it. Community involvement needs to be ongoing. Some projects have come out of the development process with functioning code and then immediately died for lack of a community.

Let’s consider format repositories now. An important issue in preservation is figuring out the formats of mysterious files. Repositories with information about lots of different formats are a valuable tool for doing this. The most successful of these is PRONOM, from the UK National Archives. It has a lot of valuable information but also significant holes; the job is too big for one institution to keep up with.

To address this difficulty, there was a project called GDFR — the Global Digital Format Repository. Its idea was that there would be mirrored peer repositories at multiple institutions. This was undertaken by Harvard and OCLC. It never came to a successful finish; it was a very complex design, and there were some communication issues between OCLC and Harvard developers (including me).

A subsequent effort was UDFR, the Unified Digital Format Repository. This eliminated the complications of the mirrored design and delivered a functional website. It’s not a very useful site, though, because there isn’t a lot of format information on it. It wasn’t able to develop the critically necessary community.

A different approach was a project called “Just Solve the Problem.” Rather than developing new software, it uses a wiki. It started with a one-month crowdsourced effort to put together information on as many formats as possible, with pointers to detailed technical information on other sites rather than trying to include it all in the repository. It’s hard to say for sure yet, but this may prove to be a more effective way to create a viable repository.

The basic point here is that preservation outreach needs to be at people’s own level. So what am I doing about it? Well, I have an e-book coming out in April, called Files that Last. It’s aimed at “everygeek”; it assumes more than casual computer knowledge, but not specialization on the reader’s part. It addresses the issues with a focus on practical use. But so much for my book plug.

To recap: L-space is a subspace of “Worldspace,” and we need to reach out to it. We need to engage, and engage in, user communities. Software developers for the library need to reach a broad range of people. We need to start by understanding the knowledge they already have and address them at their level, in their language. We have to help them do things their way, but better.


Tagged: communication, JHOVE, libraries, preservation, software
Categories: Planet DigiPres

Reaching out from L-Space

File Formats Blog - 15 February 2013 - 12:04pm

(This article is based on a presentation I made at Dartmouth’s Baker Library on February 7. I’m working from the outline rather than a transcript and have made some changes for the written medium. It’s split into two parts because of its length.)

Terry Pratchett wrote in Guards! Guards!:

It seemed quite logical to the Librarian that, since there were aisles where the shelves were on the outside then there should be other aisles in the space between the books themselves, created out of quantum ripples by the sheer weight of words. There were certainly some odd sounds coming from the other side of some shelving, and the Librarian knew that if he gently pulled out a book or two he would be peeking into different libraries under different skies.

All libraries everywhere are connected in L-space. All libraries. Everywhere.

Right now we’re in the L-space connection between developers and librarians, and the one between librarians and developers on the one hand and students and faculty on the other. L-Space can be a trap, though. If we stay inside it so much that we only talk to each other, we’re missing the whole point of the library’s existence. Pratchett’s Librarian falls a bit short on communication skills, since he’s an orangutan; then again, so do a lot of programmers. Maybe that’s why they call us code monkeys.

The issue of talking tech to non-techies isn’t just for programmers. Librarians are immersed in tech jargon these days: OPACs, MARC records, the OAIS model, etc. Communication levels aren’t just a binary issue. There’s a saying: “There are 10 kinds of people: those who understand binary and those who don’t.” It’s easy to split the world into “us” and “everyone else.” We all have our own sets of assumptions, which we may not realize are there. “Everyone knows” certain things, and those who don’t must be “hopelessly ignorant.” Everyone but the ignorant knows the difference between an application and a file format, Java and JavaScript, what happens in the browser and what happens in the server. It’s easy for any in-group to think of the rest of the world as just outsiders, and for programmers to think of everyone else as computer-illiterate.

However, all people have their own specialties and knowledge. Faculty clearly have their specialties. Students are more comfortable with some kinds of tech, like mobile devices, than many of us are. A good friend of mine is a grocery clerk, and she can teach me things about product codes and scanners. It’s a deadly error to assume that people are too dumb to grasp the benefits of something. This assumption can be harder to work past than actual user ignorance.

For example: I live in a condominium, which is very well-managed on the whole. At one owners’ meeting, though, I pointed out a problem with the PDF newsletters that were being sent by email. They’re sent as scanned images, not as text PDFs, which means they aren’t searchable and people with vision problems can’t take advantage of technologies such as text-to-speech. One of the board members told me I was entirely right, but the owners just weren’t capable of understanding such issues, so it wasn’t worth doing anything. He said this in front of the owners!

People are generally better at solving practical problems than at abstract reasoning. We evolved to survive, not to fit any specific paradigm of knowledge. People understand what they need to understand.

Successful communication happens when the message received equals the message sent. It requires that the parties have a common language, and it can happen only when they share an area of understanding.

Developers need to understand their audience. “Non-programmer” doesn’t mean “non-computer-literate.” Communication needs to be in terms which relate to the audience’s purpose. This comes in two levels for library developers: Talking to library people in library terms, and talking to library users in the terms in which they use the library. We need the help of library people when doing the second.

We’re dealing with a knowledgeable audience: students and faculty. They understand the Internet on a user level. They know how to look for books, even if they do it mostly on Amazon. Students in particular understand mobile devices. Talking below their level is as bad as going over their heads. We need to know what their world is, and we need to address its needs. We need to make the library fit the users’ world.

We have to offer something that’s worth trying out and make it easy to understand. It has to offer something they don’t already have. There’s a saying: “The Internet is the world’s largest library, with all the books on the floor.” The users should get the sense not just that the books are on shelves, but that they control the shelving, that they can organize information the way they need it.

On the whole and on average, users think less analytically than programmers. They don’t see all the consequences of a proposed fix. For instance: Users may complain about having to log back into a system too frequently. The obvious fix for them is to increase session length and time out less often, but they may not think of the loss of security that results, especially on public computers.

Users like DWIM systems — ones that “do what I mean.” These have to guess what the user means. When they guess right, it’s great, but it’s really annoying when they guess wrong. If you’ve ever had a search engine rewrite your search, you know what I mean. Try searching for “droid file tool,” looking for results about the UK National Archives’ file-identification tool called Droid. On Google, you’ll get a bunch of results for “Android.” That’s not the Droid you’re looking for.

Developers need to explain the consequences of a design choice, that getting X implies also getting Y. Figuring out what will really meet the users’ needs, as opposed to what they initially say they want, can be a challenge.

Again, two paths through L-space are needed here. Librarians need to talk the users’ language, and programmers need to talk the librarians’ and the users’ language. Librarians need to assist us in talking the users’ language.

(Continued in part 2)


Tagged: communication, libraries, software
Categories: Planet DigiPres

Pages