Feed aggregator

Older Personal Computers Aging Like Vintage Wine (if They Dodged the Landfill)

The Signal: Digital Preservation - 29 April 2013 - 8:03pm

We have moved so far so fast with personal computing that older machines are acquiring a cultural patina. Everyone, seemingly, has a memory of  ”old computers,” even if some people think having a hard drive under 100 gigabytes fits the definition.

e-Waste, by curtis palmer, on Flickr

e-Waste, by curtis palmer, on Flickr

There are perhaps two ways to think about obsolete computers. One is as trash or e-waste, which is a serious environmental problem. The issue has been building for years as computers and related peripherals age-out after a few short years and are replaced by equipment that itself will be tossed in the near future. Even if they work just fine, older machines often are perceived to be too slow, too clunky or too uncool to keep around. Recycling is possible, but it doesn’t always happen the way it should, resulting in exposures to dangerous chemicals and other materials.

Ironically, some older machines that escaped being dumped have a second life that far exceeds their original intended purpose. All you have to do is glance at the vintage computing section of an online auction website to see how valuable certain kinds of equipment has become. And, if you are lucky, you can even find good stuff for free: I liberated a fully functional Osborne 1 portable computer from a trash heap a few years ago, for example.

The rarest personal computers are the original models dating back to the 1970s.  I found a great picture on Wikimedia that shows some of the earliest models, now exhibited at the Computer History Museum in Mountain View, California.

Early Personal Computers, Wikimedia Commons

Early Personal Computers, Wikimedia Commons

All this goes to say that if you know about a stash of old computer equipment it might be worth checking to see if it has secondary value. Older machines can live on for functional purposes, such as reading old software. Or they might simply have aesthetic value as reminders of the early days of computing. Either possibility beats adding to the e-waste problem.

Categories: Planet DigiPres

DELCAM PLC : Delcam to show electrode solution at Amerimold - 4-traders (press release)

Google News Search: "new file format" - 29 April 2013 - 4:25pm

DELCAM PLC : Delcam to show electrode solution at Amerimold
4-traders (press release)
At the heart of the novel Delcam solution for electrode manufacture is a new file format - the .Trode file.This contains all the information for each electrode project, including not only the electrode design but also the machining and inspection ...

Categories: Technology Watch

Promoting FTL

Files That Last - 29 April 2013 - 11:13am

Looking for a way to get the word out about digital preservation? I’ve added a new page on reviewing FTL to this site. All publicity (well, nearly all) is good!


Categories: Planet DigiPres

Hack to preserve: increasing your organisational competence

Open Planets Foundation Blogs - 28 April 2013 - 4:18pm

 

While the digital preservation challenge is caused by technology, it is not solved by technology. Many research projects started out with the ambition to devise a technology solution (migration, emulation, encapsulation, etc.) and many memory institutions thought it would suffice to apply the R&D results: the methods and associated tools. However, it has become clear that such all encompassing solutions do not exist. In addition, many tools and approaches have not survived the R&D stage. So, while R&D remains important to conduct research in specific, well-defined problem areas, it is not the main driving force behind digital preservation.

 

Although OPF originates from a research project and continues to foster R&D, its philosophy of digital preservation concentrates less on technology as a solution and more on growing digital competence as a long-term approach to digital preservation. In previous blogs I gave some background on this philosophy, which aims to

1)    foster learning by doing as a means to develop skills and expertise in an area where best practices and standards have not yet matured and where research plays an important supportive role;

2)     cultivate a community of experts and skilled people who embrace the values of active learning and professional sharing, values which assume a certain degree of organisational readiness on the part of memory institutions.

In this blog I will explain how the OPF hackathons are supporting these aims and why preservation managers should send their staff to OPF-hackathons.

 

What are OPF hackathons?

Our hackathons are 3 day-events organised around a specific digital preservation topic or challenge and bring together curators (those who understand the content and value of their collections) and software engineers (those who understand the underlying digital nature of these collections). In OPF-speak, we bring together the “practitioners” and the “developers”, which is a practical way to distinguish between 2 different roles: 1) the role of the practitioners who collect digital materials and can come with real examples and real, day-to-day problems they encounter when managing these materials; 2) the role of the developers which is the equivalent to that of the “conservators” in the analogue domain: they examine the digital materials (the files and the bit streams underlying the digital objects); suggest methods for storing, displaying, treating and processing them; research new techniques; etc.

In bringing these 2 roles together we are creating fruitful synergies, which not only result in practical solutions but more importantly, in cultivating a community of experts who share and develop professional practices together. The concept is simple: practitioners bring troubled data and developers “hack” with existing tools and develop practical approaches. Usually the problems and solutions are very much hands-on. They are neither about state of the art R&D nor about building future frameworks or digital sustainability platforms. They are not about risk assessment or risk management. We talk about the day-to-day operations and the use of tools such as Apache Tika and DROID, in real practice. We talk about integration of tools in workflows and compare practices. In this way we are building a shared practice, based on learning by doing.

 

Why is it important for memory institutions to send their people to OPF hackathons?

Institutions with a mission to preserve society’s digital heritage need to develop competence and confidence in digital preservation. It is OPF’s conviction that the best way to do so is by investing in staff development. OPF hackathons are better substitutes to (and cheaper than) training programmes. They help your staff to develop the knowledge, skills and abilities needed to perform their daily tasks. Through participation they can rely on peer support from the OPF community and vice versa derive job satisfaction from contributing to the community.

Categories: Planet DigiPres

Files that Last

File Formats Blog - 26 April 2013 - 7:18pm

Just in case you don’t follow the other channels in which I’ve been talking it up, Files that Last, my new e-book on digital preservation for “everygeek,” is now out. It covers issues of backup, archiving, file formats, and long-term planning. Right now it’s available from Smashwords, Kobo, and the iTunes Store. It hasn’t shown up on Amazon yet, but I expect it will soon.

I’m not exactly impartial on this, but I think you’ll find it a valuable resource for preservation planning on the personal level and for large and small organizations.


Tagged: books, ebooks, preservation
Categories: Planet DigiPres

Data Management Planning tools: still immature?

Unsustainable Ideas - 26 April 2013 - 2:43pm

I’ve spent the last few months looking at the JISC data management planning projects. It’s been very interesting. Data management planning for research is still comparatively immature, and so are the tools that are available to support it. The research community needs more and better tools at a number of levels. Here are my thoughts… what do you think?

At group or institution level, we need better “maturity assessment” tools. This refers to tools like:

  • DCC CARDIO for assessing institutional readiness,
  • the DCC Digital Asset Framework for understanding the landscape of data resources,
  • repository risk assessment and quality assessment tools like DRAMBORA, Data Seal of Approval, etc
  • security assessment tools including audits based on ISO 27000.

Some of the existing tools seem rather ad hoc, as if they had emerged and developed from somewhat casual beginnings (perhaps not well put; maybe from beginnings unrelated to the scale of tasks now facing researchers and institutions). It is perhaps now time for a tool assessment process involving some of the stake-holders to help map the landscape of potential tools, and use this to plot development (or replacement) of existing tools.

For example CARDIO and DAF, I’m told, are really tools aimed at people acting in the role of consultants, helping to support a group or institutional assessment process. Perhaps if they could be adjusted to be more self-assessment-oriented, it might be helpful. The DAF resource really needs to be brought up to date and made internally consistent in its terminology.

Perhaps the greatest lack here is a group-oriented research data risk-assessment tool. This could be as simple as a guide-book and a set of spreadsheets. But going through a risk assessment process is a great way to start focusing on the real problems, the issues that could really hurt your data and potentially kill your research, or those that could really help your research and your group’s reputation.

We also need better DMP-writing tools, ie better versions of DMPonline or DMP Tool. The DCC recognises that DMPonline needs enhancement, and has written in outline about what they want to do, all of which sounds admirable. My only slight concern is that the current approach with templates for funders, disciplines and institutions in order to reflect all the different nuances, requirements and advice sounds like a combinatorial explosion (I may have misunderstood this). It is possible that the DMP Tool approach might reduce this combinatorial explosion, or at least parcel elements of it out to the institutions, making it more manageable.

The other key thing about these tools is that they need better support. This means more resources for development and maintenance. That might mean more money, or it might mean building a better Open Source partnership arrangement. DMPonline does get some codebase contributions already, but the impression is that the DMP Tool partnership model has greater potential to be sustainable in the absence of external funding, which must eventually be the situation for these tools.

It is worth emphasising that this is nevertheless a pretty powerful set of tools, and potentially very valuable to researchers planning their projects and institutions, departments etc trying to establish the necessary infrastructure.


Categories: Planet DigiPres

Create new world for Minecraft Xbox 360 update not necessary - Insider Media Group

Google News Search: "new file format" - 26 April 2013 - 3:26am

Insider Media Group

Create new world for Minecraft Xbox 360 update not necessary
Insider Media Group
“Mojang eventually fixed this problem by introducing a new file format that saves the generated weather for each chunk in addition to the terrain. 4J could have and should have used this new file format from the very beginning. I can forgive Mojang for ...

and more »
Categories: Technology Watch

Before You Were Born: Image Digitization, a Personal Reminiscence

The Signal: Digital Preservation - 25 April 2013 - 2:29pm

Image scanning of one sort or another has been in common usage in some industries since the 1920s.

Yes, really, the 1920s.

The news wire services used telephotography — where images are captured using photo cells and transmitted over phone lines — well into the 1990s.  Scanners and digital cameras like those we are familiar with came out of development in the 1960s and 1970s, and were already hitting the commercial market by the 1980s.

I have vivid memories of my first digitization project, because that project changed the course of my career.

In 1986 I was in graduate school and volunteering for the Fowler Museum of Cultural History at UCLA.  One day the Collections Manager came down to the archaeology collections in the sub-basement (where I was surveying the human skeletal remains in the collections for our NAGPRA records) and said to me: “How would you like to move from the sub-basement to the basement”?  How could anyone say no to that?

A 1987 digital image print

A 1987 digital image print

The project was to do a recon on all the paper records and enter them into the brand new Argus system running on a mini-mainframe.  I am pretty certain that we were Questor’s second customer, after the Southwest Museum. While the recon project taught me the basics about what became the focus of my career — collection records management, digitization, system administration, being a DBA, working with authority control and creating multilingual controlled vocabularies — what was particularly exciting about the system was that it had the capacity to link to digital images.

So we started digitizing.  We had acquired a particularly exciting and important archaeological collection, and I had the opportunity to work on the digitization.  The objects were set on a stand and the image was captured via a video camera and written to tape, with a video titler used to embed the accession number into the image. The tapes were then mastered onto laser disks.

Now, this was very cutting edge – one entered an address for an image on a laser disk into a field in the object record, and the system could address the file on the laser disk and display it on a dedicated terminal. We had an early Sony Mavica camera, which used 3.5″ floppy disks as its storage media. And we had a printer, which printed color photos the size of old school Polaroids. It was heady stuff.

In 1988 I attended my first Museum Computer Network conference, another event that shaped my career.  The 1989 MCN meeting was the pivotal one.  We had our first meeting of a Visual Information SIG, where at least a dozen organizations shared their experiments, successes, and failures with digital imaging. I still have my write-up from that meeting, which appeared as a column in Spectra. I chaired that group for many years, and that group helped build a community around imaging practice that still exists.

Of course there were many early leaders and innovators in digital imaging. The American Museum of Natural History.  The Fine Arts Museums of San Francisco Thinker imagebase. The Library of Congress American Memory project.  Harvard University’s libraries and museums. Numerous Smithsonian projects. And too many others to name.

What other imaging projects were people involved in during the 1980s?  If you are interested in the history of digital imaging I suggest the Digital Imaging page at CoOl, which includes a great historical bibliography. Not all the links work, but it’s a great jumping-off point for a history of the discipline.

Categories: Planet DigiPres

The SPRUCE Business Case for Digital Preservation

OPF Wiki Activity Feed - 25 April 2013 - 12:59pm

Page edited by Paul Wheatley

View Online Paul Wheatley 2013-04-25T12:59:17Z

FTL, now available from iTunes and Kobo!

Files That Last - 25 April 2013 - 10:14am

You can now buy Files that Last from the iTunes Store or from Kobo!

Which way should you go? I’ll say first of all, just buy the book and I’ll be happy. Buying through Smashwords will give me a bigger cut than the other channels, but a sale’s a sale. If you’re planning to read it on an iPhone, iPod, or iPad, getting it from Apple is the easiest way to get it on there. I don’t really know anything about Kobo.

There should be more ways to buy FTL within the next week or so.


Categories: Planet DigiPres

Agenda - Disk Images and Digital Forensics (Copenhagen)

OPF Wiki Activity Feed - 24 April 2013 - 4:41pm

Page edited by Carl Wilson - "Added detail to failing drive session"

View Online Carl Wilson 2013-04-24T16:41:31Z

Slide show on FITS progress

File Formats Blog - 24 April 2013 - 4:09pm

Last Friday’s CURATEcamp AVpres was a collaboration between several physical sites, using Google Hangout and IRC. I’d been asked if I could do a lightning presentation online on my work on FITS, but I had a commitment on the 19th, so Andrea Goethals at the Harvard Library said she’d do one.

That, unfortunately, was the day the Tsarnaev brothers went on their spree in Cambridge, and Harvard was closed for the day. Paul Wheatley picked up the job on short notice and did a presentation; the slide show is online. Paul suggested people should look at the work I’m putting on the Github repository after I’m finished at the end of April, but I wouldn’t mind if people tried it out now, while I’m still devoting my time to the project.


Tagged: FITS, software
Categories: Planet DigiPres

FFident

File Formats Blog - 24 April 2013 - 2:35pm

A simple but useful tool that’s part of FITS’s collection is FFident, written by Marco Schmidt. He apparently is no longer maintaining it, and its page disappeared from the Web but was retained on the Internet Archive. It seemed like a good idea to make it more readily available, so I’ve put it, using its LGPL license, into a Github repository.

FITS uses its own copy of the source code, so this really isn’t tested at all in its own right, but it’s there for people to play with. I added a build.xml file and organized the code the way Eclipse likes it. I don’t have any plans to support it, but if anyone wants to play with it, it’s there.


Tagged: format identification, java, software
Categories: Planet DigiPres

The Content Matters Interview Series: Dr. Sylvia Chou of the National Cancer Institute

The Signal: Digital Preservation - 24 April 2013 - 2:01pm

The following is a guest post by Christie Moffatt, Manager, Digital Manuscripts Program, History of Medicine Division, National Library of Medicine

Dr. Sylvia Chou

In this installment of the “Content Matters” series of the National Digital Stewardship Alliance Content Working Group, I interview Dr. Sylvia Chou, PhD, MPH, Program Director of the National Cancer Institute’s Health Communication and Informatics Research Branch.  Sylvia recently spoke at the National Library of Medicine on “Health Communication in the Digital World: Innovation and the Role of Social Media,” sharing some of her research on health-related uses of social media. In this interview, I asked Sylvia to describe a bit more about her research and her perspective on the value of preserving Web 2.0 communications (as well as scientific communications about those communications) over time.

Christie: Could you share a bit about your interest in social media and health?  What is the focus of your current research?

Sylvia: I came to the National Cancer Institute as a Cancer Prevention Fellow interested in how people communicate about health and the impact of communication on attitudes, perceptions, and behavior related to health.  One of my first publications, “Social Media Use in the United States: Implications for Health Communication,” based on data from the Health Information National Trends Survey (HINTS), received attention, particularly on the digital divide and how public health and clinical researchers may leverage Web 2.0 communication in their work.

As I engaged in analyzing national surveys on this topic, I also engaged in qualitative research on social media in an effort to begin understanding motivations and the nature of use.  As an example, I began studying cancer survivors’ narratives posted on YouTube to better understand how and why people are sharing personal stories through what’s later termed “user-generated content,” characteristic of social media.  Subsequently, in a literature review, we saw the abundance of commentaries about social media, but generally not as much empirical work to-date testing the utility of social media for health promotion. We felt that the time was ripe for more rigorous research on the topic.

Screen shot from HINTS

Christie:  What have you been able to learn through this research about the users of social media for health communication? How are people using social media to communicate about health?

Sylvia: We’ve learned that social media reactions (to health messages, including health campaigns and advertisements) are proving to be an authentic representation of the way people feel about them and can serve as interesting “data” for social scientists. Also (and related), that active users of social media have different levels of self-disclosure.  In some social media venues (e.g. Facebook), people seem to post everyday thoughts and observations, with little editing or shielding of privacy.

We’ve also learned more about the influence of user-generated health content, and its potential for the dissemination of public health information.  For example, there are many YouTube videos or blog posts of people sharing personal stories about a specific health care procedure, like colonoscopy, where they report that “it wasn’t so bad” or “it saved my life.”   These narratives can be more persuasive than traditional public health guidelines or communication efforts.  Stories like these are not coached and not perfect, but perhaps their authenticity makes them more effective in health promotion efforts.  On the other hand, the use of personal narratives in social media can potentially have negative health impact.  Individuals’ experiences are not necessarily evidence-based and can contain health myths, which can potentially be spread quickly on social networks.

Christie:  Where do you expect this research will make its greatest impact?

Sylvia: Empirical research on social media will be helpful to those developing strategies for health campaigns.  This research may also be helpful to clinicians, as they become more aware of the impact of social media conversations (e.g. negative stereotype of individuals suffering from health issues) when helping patients.   For instance, in our project examining social media discourse about obesity, we found rampant weight stigmatization against individuals struggling with weight issues.  Documentation of such online behavior can help clinicians and public health practitioners better understand the experiences of their clients and the barriers they face to improving their health.

Christie: What is your approach to finding and analyzing social media for use in your research? What types of communications are you studying?

Sylvia:  We have used commercial data mining companies often for more commercial/marketing purposes, but we can do keyword searches on blogs, forums, YouTube, Facebook, and Twitter.  My colleagues and other investigators have relied on Internet panels (e.g. focus groups and survey questionnaires), and data from Google and Twitter that they make available to researchers.

Christie:  How do you gather and collect the data for your research? Do you keep an archive of this data for other researchers?

Sylvia: We have stored data in the cloud (such as an Amazon server or Dropbox).  Increasingly we are seeing large data stored in such ways.   My qualitative research (e.g. on YouTube posts) are smaller in scope and so the data are a bit easier to store.  We save the URL link and the video content is transcribed.  We also use low-tech screen captures.

Christie:  Have you considered working with data in web archives?

Sylvia: I would love to work with web data in an archive.  A main concern would be about selection bias.  When I gather my own data I know my own selection criteria.  A web archive would need to be clear on how content was selected for inclusion. It is also important to be able to date/time-stamp captured content, to be able to say “as of this date this is what the content was.”

Christie: What are your observations on how the communication of research findings has changed with Web 2.0 technologies?  Do you have a blog or online notebook?

Sylvia: Many individuals and agencies feel a pressure to host a blog and Tweet (we see the example of this from our own NIH Director Francis Collins), but I have not done so yet and continue to publish my research through the formal peer review process.  At the end of the day, having my publication in PubMed provides credibility and a searchable presence. As a scientist, peer review justifies what I do.   I have been thinking, though, of becoming a little more active on Twitter, as oftentimes I have to rely on my colleagues (particularly postdocs) to notify me of relevant Tweets regarding my presentations/research, and I probably should start becoming more engaged on these channels.

Christie: How can the digital preservation community reach out to researchers on issues of digital preservation?

Sylvia: The digital preservation community could aim for what is equally important, but not captured in the traditional scientific publishing.  Some academics, for example, may be substituting traditional publishing venues with engagement on blogs, and such work is not reflected in the scientific databases.

Perhaps institutions could partner with researchers to develop preservation strategies to support research questions (i.e. preserve the specific social media or other digital content used in research) and manage that data a scientific space.

Christie: One of the struggles the digital preservation community faces is the fact that not everything can (or should) be preserved.  What is your perspective on the most valuable content to preserve in support of your own research?

Sylvia: The digital preservation community should play a role in selectively preserving/documenting the diachronic evolution of material, content and ways of sharing value, with an aim to filter out some of the noise.

On the other hand, I can see that with more content that is captured, more research can be done.  Some kind of sampling of social media could be done, but the process for doing so would need to be well-framed. The amount selected would depend on the level of resources available.

From a social science or epidemiological perspective, there are some interesting research questions about the history of health and health communication.  An archive of health-related social media communications could serve as a rich resource for studying how people talk about health and science in a given time or place, and how these types of communications are changing over time. We can study shifts in the way that information is communicated.

——-

What kind of content matters to you? This is but one case for preserving valuable content for long-term access.  If you or your institution would like to share your own story of use and long term value of access to a particular type of born-digital resources, please send us a note at ndsa@loc.gov and in the subject line mark it to the attention of the Content Working Group.  We would love to hear from you!

Categories: Planet DigiPres

FTL for libraries

Files That Last - 24 April 2013 - 10:36am

Libraries can buy Files that Last through Axis360 and Cloud Library, or will be able to at some point in the future. Since libraries are clearly key customers, both as users and as lenders, I’ve made the book available to them at a permanent discount, for $6.99. In addition to those aggregators, buyers can buy through Smashwords’ Library Direct.

Librarians, please let me know if you have good or bad experiences buying the book this way, or if you’ve had past experience with these channels.


Categories: Planet DigiPres

Personal Digital Preservation Sonnet

The Signal: Digital Preservation - 23 April 2013 - 3:04pm

The following is a guest post by Tess Webre, intern with NDIIPP at the Library of Congress

Shall I compare thee to a closed format/ thou art more open and accessible. – Author after 3 coffees.
In honor of William Shakespeare’s 449th birthday on April 23rd (observed) and as a tie-in with the recent digital humanities day, I have decided to make a Digital Stewardship sonnet. This is a traditional sonnet; it has iambic pentameter and everything – courtesy of Mr. Boswell’s 11thgrade AP English. See, I was paying attention – however, I cannot attest to the content being traditional.

William Shakespeare, from tonynetone on Flickr

Personal Digipres Sonnet
By Tess Webre

Want to ensure a file’s longevity
So that you can always access the bits?
First know, it’s your responsibility
And you must use your cunning and your wits.

Organize your files, that’s simple enough.
In an open format, you know the drill
Update storage media – care ain’t tough –
Keep dispersed copies, just in case of spills.

To help, for we know that this might sound like Greek,
The Signal
is a blog we hope you love,
It also would be good to take a peek
At digitalpreservation.gov.

Keep studying; your skills will be greater.
‘Til next time I wish you all safe data.

 

Categories: Planet DigiPres

Re: Building Your Debian Package

OPF Wiki Activity Feed - 23 April 2013 - 12:52pm

Comment edited by Johan van der Knijff

As a follow-up to this, in the blog post below I describe how I used this guide to create Debian packages for jpylyzer:

http://www.openplanetsfoundation.org/blogs/2013-04-23-adventures-debian-packaging

Two minor corrections (I think) to this guide:

1. in the Getting Started section:

Directory name: by default, Dh_make doesn't accept package and version names that are separated by underscores (''), and expects a dash('-') instead. Underscores are only allowed in combination with the _-p flag.

2. in the Verifying your Package section:

  • E: Error (should be fixed)
  • W: Warning (must be fixed)

I think this should be the other way round?

View Online Johan van der Knijff 2013-04-23T12:52:58Z

Re: Building Your Debian Package

SCAPE Wiki Activity Feed - 23 April 2013 - 12:52pm

Comment edited by Johan van der Knijff

As a follow-up to this, in the blog post below I describe how I used this guide to create Debian packages for jpylyzer:

http://www.openplanetsfoundation.org/blogs/2013-04-23-adventures-debian-packaging

Two minor corrections (I think) to this guide:

1. in the Getting Started section:

Directory name: by default, Dh_make doesn't accept package and version names that are separated by underscores (''), and expects a dash('-') instead. Underscores are only allowed in combination with the _-p flag.

2. in the Verifying your Package section:

  • E: Error (should be fixed)
  • W: Warning (must be fixed)

I think this should be the other way round?

View Online Johan van der Knijff 2013-04-23T12:52:58Z
Categories: SCAPE

Adventures in Debian packaging

SCAPE Blog Posts - 23 April 2013 - 10:53am

About a year ago, work started on packaging SCAPE tools. Jpylyzer was the first SCAPE tool that was turned into a Debian package. Some time later, the OPF set up a couple of machine images at Amazon Web Services, which can be used to create packages repeatedly using a virtual machine. Even though I've used the Amazon service a couple of times myself, I really know next to nothing about Debian packages, and it's safe to say that the underlying build process has been more or less a complete mystery to me.

To get a better understanding of the process for building Debian packages, I had a try at packaging jpylyzer on my local machine (which runs on Linux Mint 14). Some time ago Dave Tarrant and Rui Castro wrote a nice step-by-step guide on building Debian packages on the OPF Wiki, so I tried to follow the instructions there. While working on this, I made some notes, mainly to remind myself of what I was doing. Then I realised that some of this might be useful to others as well, so I decided to turn it into a blog post.

Objectives

The objectives of this exercise were:

  • to get more more familiar with the packaging process myself;
  • to provide some input on how useful the guide on the OPF Wiki is from the perspective of someone who is largely ignorant of the packaging procedure;
  • to identify any problems in jpylyzer's packaging procedure.

I did two experiments: first, I did a very limited test where I tried to create a template directory structure using debhelper, which would be the first step when starting from scratch. Since for jpylyzer all the files in the debian directory already exist, I then moved on to building jpylyzer using the existing files.

Test 1: creating the directory structure from scratch

For this, I first installed all the required packages listed in the Pre-Requisites section of the guide using:

sudo apt-get install build-essential dh-make devscripts debhelper lintian

Subsequently I followed the instructions in the Getting Started section. For this I simply created an empty directory:

mkdir debtest_1.0.0

And then:

cd debtest_1.0.0

Then I ran dh_make:

dh_make

This resulted in an error message, telling me that the package name and its version number should be separated by a dash ('-') instead of an underscore ('_'), or, alternatively, that the -p flag should be used. So I changed the directory name:

mv debtest_1.0.0 debtest-1.0.0

Re-running dh_make, it now accepted the directory name, but it complained about a missing tarball (which I purposefully didn't make in this test). However, as dh_make offered the suggestion to use the --createorig option (which creates a tarball) I tried this:

dh_make --createorig

This resulted in the creation of a debian directory with file templates, and an (empty) tarball debtest_1.0.0.orig.tar.gz which was created in the parent (debtest) directory.

So, apart from the dash/underscore mix-up this is all pretty straightforward.

Test 2: building jpylyzer

In this second test I tried to build jpylyzer using the already existing files in the debian folder of jpylyzer's Git repository. First I cloned the repository to my local machine:

git clone git://github.com/openplanets/jpylyzer.git

Then I went into the jpylyzer directory:

cd jpylyzer

From there I tried to build jpylyzer directly, using the command given in the guide's Building your package section:

dpkg-buildpackage -tc Missing changelog

The above command resulted in an error message about a missing changelog file in the debian folder. The changelog section in the OPF guide does mention an OPF-hosted GitHub 2 Changelog service, which is supposed to be callable from the rules file. But I don't see any reference to it in jpylyzer's rules file, so I don't really know how this is supposed to work! To to keep going I simply grabbed the default changelog that was created by debhelper in an earlier experiment. After this I ran the command again.

Unknown commands in makefile

This time, dpkg-buildpackage exited with the following errors:

pymakespec --onefile jpylyzer.py make[1]: pymakespec: Command not found make[1]: *** [build] Error 127 make[1]: Leaving directory `/home/johan/debtest/jpylyzer' make: *** [build] Error 2 dpkg-buildpackage: error: debian/rules build gave error exit status

These errors arise from the following lines in jpylyzer's makefile:

build: pymakespec --onefile jpylyzer.py pyinstaller jpylyzer.spec @echo "Built in dist/jpylyzer"

The pymakespec and pyinstaller commands above are most likely shell scripts that launch the Makespec.py and pyinstaller.py scripts that are both part of PyInstaller (these are used for building an executable from the source code). However, neither the shell scripts nor any references to them are included in jpylyzer's repository (my best guess is that they exist only on a specific machine instance - perhaps the Amazon virtual machines?), so the makefile simply won't work.

I was able to fix this by changing the references to the shell scripts to this (using PyInstaller 1.5):

python /home/johan/pyinstall1.5/Makespec.py --onefile jpylyzer.py python home/johan/pyinstall1.5/pyinstaller.py jpylyzer.spec

For PyInstaller 2 these two lines should be substituted by:

python /home/johan/pyinstall/pyinstaller.py --onefile jpylyzer.py

Note here that PyInstaller has no default installation location, and the file paths will vary from machine to machine!

After making these changes I was able to run dpkg-buildpackage without any problems:

dpkg-buildpackage -tc

Result: the following files were created in the repo's parent directory:

  • jpylyzer_1.9.0_amd64.changes
  • jpylyzer_1.9.0_amd64.deb
  • jpylyzer_1.9.0.dsc
  • jpylyzer_1.9.0.tar.gz
Tarball schmarball

One thing that confused me at first: the Getting Started section in the OPF guide mentions the need for building a native package before starting the Debian packaging:

If you have got here and you don't have any already packaged code (a tar ball with makefile etc) then you will need to build a native package.

So, I initially thought I would need to create a tarball of my repo first. As it turns out this is not the case: the tarball is created automatically once you run dpkg-buildpackage. So this is one thing less to worry about!

Verifying the package with lintian

As a final step I used lintian to verify my package:

lintian jpylyzer_1.9.0_amd64.deb

This resulted in the following output (using PyInstaller 1.5):

E: jpylyzer: unstripped-binary-or-object usr/bin/jpylyzer W: jpylyzer: hardening-no-fortify-functions usr/bin/jpylyzer W: jpylyzer: wrong-bug-number-in-closes l3:#nnnn E: jpylyzer: debian-changelog-file-contains-invalid-email-address johan@unknown E: jpylyzer: helper-templates-in-copyright

With PyInstaller 2 I got this additional warning:

W: jpylyzer: hardening-no-relro usr/bin/jpylyzer

I still need to give these errors and warnings an in-depth look. At least one error is related to the bogus changelog file I used. Some others (e.g. unstripped-binary-or-object) appear to be related to the build process of the binaries.

Conclusions

Using the Building Your Debian Package guide on the OPF Wiki I was able to create a rudimentary skeleton structure for Debian packaging. I was also able to build a Debian package for jpylyzer. The exercise revealed some problems with the Debian setup for jpylyzer. The most important ones are:

  • It's unclear how jpylyzer's changelog file is supposed to be generated. Perhaps there's a dependency on some external service (the OPF Github 2 Changelog service?), but I cannot find any documentation on how to make this work!
  • The makefile calls PyInstaller in a non-standard an undocumented way. This is easy to fix locally if you are familiar with PyInstaller, but not so otherwise. Also, the interfaces of versions 1.5 and 2 of PyInstaller are different, and depending of what version you are running this may require additional changes to the makefile.
  • Even though I was able to build a Debian package for jpylyzer, it still ended up with some lintian errors.

I also came across a few minor errors in the OPF guide. I left a short comment on this here (scroll to bottom). Overall, I found the guide really helpful, and it provides an accessible and relatively painless introduction to the packaging process.

Reference

Building Your Debian Package (OPF Wiki)

Preservation Topics: Open Planets FoundationSCAPEjpylyzer
Categories: SCAPE

Adventures in Debian packaging

Open Planets Foundation Blogs - 23 April 2013 - 10:53am

About a year ago, work started on packaging SCAPE tools. Jpylyzer was the first SCAPE tool that was turned into a Debian package. Some time later, the OPF set up a couple of machine images at Amazon Web Services, which can be used to create packages repeatedly using a virtual machine. Even though I've used the Amazon service a couple of times myself, I really know next to nothing about Debian packages, and it's safe to say that the underlying build process has been more or less a complete mystery to me.

To get a better understanding of the process for building Debian packages, I had a try at packaging jpylyzer on my local machine (which runs on Linux Mint 14). Some time ago Dave Tarrant and Rui Castro wrote a nice step-by-step guide on building Debian packages on the OPF Wiki, so I tried to follow the instructions there. While working on this, I made some notes, mainly to remind myself of what I was doing. Then I realised that some of this might be useful to others as well, so I decided to turn it into a blog post.

Objectives

The objectives of this exercise were:

  • to get more more familiar with the packaging process myself;
  • to provide some input on how useful the guide on the OPF Wiki is from the perspective of someone who is largely ignorant of the packaging procedure;
  • to identify any problems in jpylyzer's packaging procedure.

I did two experiments: first, I did a very limited test where I tried to create a template directory structure using debhelper, which would be the first step when starting from scratch. Since for jpylyzer all the files in the debian directory already exist, I then moved on to building jpylyzer using the existing files.

Test 1: creating the directory structure from scratch

For this, I first installed all the required packages listed in the Pre-Requisites section of the guide using:

sudo apt-get install build-essential dh-make devscripts debhelper lintian

Subsequently I followed the instructions in the Getting Started section. For this I simply created an empty directory:

mkdir debtest_1.0.0

And then:

cd debtest_1.0.0

Then I ran dh_make:

dh_make

This resulted in an error message, telling me that the package name and its version number should be separated by a dash ('-') instead of an underscore ('_'), or, alternatively, that the -p flag should be used. So I changed the directory name:

mv debtest_1.0.0 debtest-1.0.0

Re-running dh_make, it now accepted the directory name, but it complained about a missing tarball (which I purposefully didn't make in this test). However, as dh_make offered the suggestion to use the --createorig option (which creates a tarball) I tried this:

dh_make --createorig

This resulted in the creation of a debian directory with file templates, and an (empty) tarball debtest_1.0.0.orig.tar.gz which was created in the parent (debtest) directory.

So, apart from the dash/underscore mix-up this is all pretty straightforward.

Test 2: building jpylyzer

In this second test I tried to build jpylyzer using the already existing files in the debian folder of jpylyzer's Git repository. First I cloned the repository to my local machine:

git clone git://github.com/openplanets/jpylyzer.git

Then I went into the jpylyzer directory:

cd jpylyzer

From there I tried to build jpylyzer directly, using the command given in the guide's Building your package section:

dpkg-buildpackage -tc Missing changelog

The above command resulted in an error message about a missing changelog file in the debian folder. The changelog section in the OPF guide does mention an OPF-hosted GitHub 2 Changelog service, which is supposed to be callable from the rules file. But I don't see any reference to it in jpylyzer's rules file, so I don't really know how this is supposed to work! To to keep going I simply grabbed the default changelog that was created by debhelper in an earlier experiment. After this I ran the command again.

Unknown commands in makefile

This time, dpkg-buildpackage exited with the following errors:

pymakespec --onefile jpylyzer.py make[1]: pymakespec: Command not found make[1]: *** [build] Error 127 make[1]: Leaving directory `/home/johan/debtest/jpylyzer' make: *** [build] Error 2 dpkg-buildpackage: error: debian/rules build gave error exit status

These errors arise from the following lines in jpylyzer's makefile:

build: pymakespec --onefile jpylyzer.py pyinstaller jpylyzer.spec @echo "Built in dist/jpylyzer"

The pymakespec and pyinstaller commands above are most likely shell scripts that launch the Makespec.py and pyinstaller.py scripts that are both part of PyInstaller (these are used for building an executable from the source code). However, neither the shell scripts nor any references to them are included in jpylyzer's repository (my best guess is that they exist only on a specific machine instance - perhaps the Amazon virtual machines?), so the makefile simply won't work.

I was able to fix this by changing the references to the shell scripts to this (using PyInstaller 1.5):

python /home/johan/pyinstall1.5/Makespec.py --onefile jpylyzer.py python home/johan/pyinstall1.5/pyinstaller.py jpylyzer.spec

For PyInstaller 2 these two lines should be substituted by:

python /home/johan/pyinstall/pyinstaller.py --onefile jpylyzer.py

Note here that PyInstaller has no default installation location, and the file paths will vary from machine to machine!

After making these changes I was able to run dpkg-buildpackage without any problems:

dpkg-buildpackage -tc

Result: the following files were created in the repo's parent directory:

  • jpylyzer_1.9.0_amd64.changes
  • jpylyzer_1.9.0_amd64.deb
  • jpylyzer_1.9.0.dsc
  • jpylyzer_1.9.0.tar.gz
Tarball schmarball

One thing that confused me at first: the Getting Started section in the OPF guide mentions the need for building a native package before starting the Debian packaging:

If you have got here and you don't have any already packaged code (a tar ball with makefile etc) then you will need to build a native package.

So, I initially thought I would need to create a tarball of my repo first. As it turns out this is not the case: the tarball is created automatically once you run dpkg-buildpackage. So this is one thing less to worry about!

Verifying the package with lintian

As a final step I used lintian to verify my package:

lintian jpylyzer_1.9.0_amd64.deb

This resulted in the following output (using PyInstaller 1.5):

E: jpylyzer: unstripped-binary-or-object usr/bin/jpylyzer W: jpylyzer: hardening-no-fortify-functions usr/bin/jpylyzer W: jpylyzer: wrong-bug-number-in-closes l3:#nnnn E: jpylyzer: debian-changelog-file-contains-invalid-email-address johan@unknown E: jpylyzer: helper-templates-in-copyright

With PyInstaller 2 I got this additional warning:

W: jpylyzer: hardening-no-relro usr/bin/jpylyzer

I still need to give these errors and warnings an in-depth look. At least one error is related to the bogus changelog file I used. Some others (e.g. unstripped-binary-or-object) appear to be related to the build process of the binaries.

Conclusions

Using the Building Your Debian Package guide on the OPF Wiki I was able to create a rudimentary skeleton structure for Debian packaging. I was also able to build a Debian package for jpylyzer. The exercise revealed some problems with the Debian setup for jpylyzer. The most important ones are:

  • It's unclear how jpylyzer's changelog file is supposed to be generated. Perhaps there's a dependency on some external service (the OPF Github 2 Changelog service?), but I cannot find any documentation on how to make this work!
  • The makefile calls PyInstaller in a non-standard an undocumented way. This is easy to fix locally if you are familiar with PyInstaller, but not so otherwise. Also, the interfaces of versions 1.5 and 2 of PyInstaller are different, and depending of what version you are running this may require additional changes to the makefile.
  • Even though I was able to build a Debian package for jpylyzer, it still ended up with some lintian errors.

I also came across a few minor errors in the OPF guide. I left a short comment on this here (scroll to bottom). Overall, I found the guide really helpful, and it provides an accessible and relatively painless introduction to the packaging process.

Reference

Building Your Debian Package (OPF Wiki)

Preservation Topics: Open Planets FoundationSCAPEjpylyzer
Categories: Planet DigiPres

Pages