Feed aggregator

Tool highlight: SCAPE Online Demos

SCAPE Blog Posts - 23 September 2014 - 2:12pm

Now that we are entering the final days of the SCAPE project, we would like to highlight some SCAPE Quality Assurance tools that have an online demonstrator.

 

See http://scape.demos.opf-labs.org/ for the following  tools:

Pagelyzer: Compares web pages

Monitor your web content.

 

Jpylyzer: Validates images

JP2K validator and properties extractor.

 

Xcorr-sound: Compares audio sounds

Improve your digital audio recordings.

 

Flint: Validates different files and formats

Validate PDF/EPUB files against an institutional policy

 

Matchbox: Compares documents (following soon)

Duplicate image detection tool.

 

For more info on these and other tools and the SCAPE project, see http://scape.usb.opf-labs.org

for the content of our SCAPE info USB stick.   

 

Preservation Topics: SCAPE
Categories: SCAPE

Tool highlight: SCAPE Online Demos

Open Planets Foundation Blogs - 23 September 2014 - 2:12pm

Now that we are entering the final days of the SCAPE project, we would like to highlight some SCAPE Quality Assurance tools that have an online demonstrator.

 

See http://scape.demos.opf-labs.org/ for the following  tools:

Pagelyzer: Compares web pages

Monitor your web content.

 

Jpylyzer: Validates images

JP2K validator and properties extractor.

 

Xcorr-sound: Compares audio sounds

Improve your digital audio recordings.

 

Flint: Validates different files and formats

Validate PDF/EPUB files against an institutional policy

 

Matchbox: Compares documents (following soon)

Duplicate image detection tool.

 

For more info on these and other tools and the SCAPE project, see http://scape.usb.opf-labs.org

for the content of our SCAPE info USB stick.   

 

Preservation Topics: SCAPE
Categories: Planet DigiPres

Interview with a SCAPEr - Ed Fay

Open Planets Foundation Blogs - 23 September 2014 - 12:21pm
Ed FayWho are you?

My name is Ed Fay, I’m the Executive Director of the Open Planets Foundation.

Tell us a bit about your role in SCAPE and what SCAPE work you are involved in right now?

OPF has been involved in technical and take-up work all the way through the project, but right now we’re focused on sustainability – what happens to all the great results that have been produced after the end of the project.

Why is your organisation involved in SCAPE?

OPF has been responsible for leading the sustainability work and will provide a long-term home for the outputs, preserving the software and providing an ongoing collaboration of project partners and others on best practices and other learning. OPF members include many institutions who have not been part of SCAPE but who have an interest in continuing to develop the products, and through the work that has been done - for example on software maturity and training materials - OPF can help to lower barriers to adoption by these institutions and others.

What are the biggest challenges in SCAPE as you see it?

The biggest challenge in sustainability is identifying a collaboration model that can persist outside of project funding. As cultural heritage budgets are squeezed around the world and institutions adapt to a rapidly changing digital environment the community needs to make best use of the massive investment in R&D that has been made, by bodies such as the EC in projects such as SCAPE. OPF is a sustainable membership organisation which is helping to answer these challenges for its members and provide effective and efficient routes to implementing the necessary changes to working practices and infrastructure. In 20 years we won’t be asking how to sustain work such as this – it will be business as usual for memory institutions everywhere – but right now the digital future is far from evenly distributed.

But from the SCAPE perspective we have a robust plan which encompasses many different routes to adoption, which is of course the ultimate route to sustainability – production use of the outputs by the community for which they were intended. The fact that many outputs are already in active use – as open-source tools and embedded into commercial systems – shows that SCAPE has produced not only great research but mature products which are ready to be put to work in real-world situations.

What do you think will be the most valuable outcome of SCAPE?

This is very difficult for me to answer! Right now OPF has the privileged perspective of transferring everything that has matured during the project into our stewardship - from initial research, through development, and now into mature products which are ready for the community. So my expectation is that there are lots of valuable outputs which are not only relevant in the context of SCAPE but also as independent components. One particular product has already been shortlisted for the Digital Preservation Awards 2014 which is being co-sponsored by OPF this year while others have won awards at DL2014. These might be the most visible in receiving accolades, but there are many other tools and services which provide the opportunity to enhance digital preservation practice within a broad range of institutions. I think the fact that SCAPE is truly cross-domain is very exciting – working with scientific data, cultural heritage, web harvesting – it shows that digital preservation is truly maturing as a discipline.

If there could be one thing to come out of this, it would be a understanding of how to continue the outstanding collaboration that SCAPE has enabled to sustain cost-effective digital preservation solutions that can be adopted by institutions of all sizes and diversity.

Contact information

ed@openplanetsfoundation.org

twitter.com/digitalfay

Preservation Topics: SCAPE
Categories: Planet DigiPres

Interview with a SCAPEr - Ed Fay

SCAPE Blog Posts - 23 September 2014 - 12:21pm
Ed FayWho are you?

My name is Ed Fay, I’m the Executive Director of the Open Planets Foundation.

Tell us a bit about your role in SCAPE and what SCAPE work you are involved in right now?

OPF has been involved in technical and take-up work all the way through the project, but right now we’re focused on sustainability – what happens to all the great results that have been produced after the end of the project.

Why is your organisation involved in SCAPE?

OPF has been responsible for leading the sustainability work and will provide a long-term home for the outputs, preserving the software and providing an ongoing collaboration of project partners and others on best practices and other learning. OPF members include many institutions who have not been part of SCAPE but who have an interest in continuing to develop the products, and through the work that has been done - for example on software maturity and training materials - OPF can help to lower barriers to adoption by these institutions and others.

What are the biggest challenges in SCAPE as you see it?

The biggest challenge in sustainability is identifying a collaboration model that can persist outside of project funding. As cultural heritage budgets are squeezed around the world and institutions adapt to a rapidly changing digital environment the community needs to make best use of the massive investment in R&D that has been made, by bodies such as the EC in projects such as SCAPE. OPF is a sustainable membership organisation which is helping to answer these challenges for its members and provide effective and efficient routes to implementing the necessary changes to working practices and infrastructure. In 20 years we won’t be asking how to sustain work such as this – it will be business as usual for memory institutions everywhere – but right now the digital future is far from evenly distributed.

But from the SCAPE perspective we have a robust plan which encompasses many different routes to adoption, which is of course the ultimate route to sustainability – production use of the outputs by the community for which they were intended. The fact that many outputs are already in active use – as open-source tools and embedded into commercial systems – shows that SCAPE has produced not only great research but mature products which are ready to be put to work in real-world situations.

What do you think will be the most valuable outcome of SCAPE?

This is very difficult for me to answer! Right now OPF has the privileged perspective of transferring everything that has matured during the project into our stewardship - from initial research, through development, and now into mature products which are ready for the community. So my expectation is that there are lots of valuable outputs which are not only relevant in the context of SCAPE but also as independent components. One particular product has already been shortlisted for the Digital Preservation Awards 2014 which is being co-sponsored by OPF this year while others have won awards at DL2014. These might be the most visible in receiving accolades, but there are many other tools and services which provide the opportunity to enhance digital preservation practice within a broad range of institutions. I think the fact that SCAPE is truly cross-domain is very exciting – working with scientific data, cultural heritage, web harvesting – it shows that digital preservation is truly maturing as a discipline.

If there could be one thing to come out of this, it would be a understanding of how to continue the outstanding collaboration that SCAPE has enabled to sustain cost-effective digital preservation solutions that can be adopted by institutions of all sizes and diversity.

Contact information

ed@openplanetsfoundation.org

twitter.com/digitalfay

Preservation Topics: SCAPE
Categories: SCAPE

Weirder than old: The CP/M File System and Legacy Disk Extracts for New Zealand’s Department of Conservation

Open Planets Foundation Blogs - 23 September 2014 - 8:14am

We’ve been doing legacy disk extracts at Archives New Zealand for a number of years with much of the effort enabling us to do this work being done by colleague Mick Crouch, and former Archives New Zealand colleague Euan Cochran – earlier this year, we received some disks from New Zealand’s Department of Conservation (DoC) which we successfully imaged and extracted what was needed by the department. While it was a pretty straightforward exercise, there was enough about it that was cool enough to warrant that this blog be an opportunity to document another facet of the digital preservation work we’re doing, especially in the spirit of being another war story that other’s in the community can refer to. We do conclude with a few thoughts about where we still relied on a little luck, and we’ll have to keep that in mind moving forward.

We received 32 180kb 5.25 inch disks from DoC. Maxell MD1-D, single sided, double-density, containing what we expected to be Survey Data circa 1984/1985.

Our goal with these disks, as with any that we are finding outside of a managed records system, is to transfer the data to a more stable medium, as disk images, and then extract the objects on the imaged file system to enable further appraisal. From there a decision will be made about how much more effort should be put into preserving the content and making suitable access copies of whatever we have found – a triage.

For agencies with 3.5-inch floppy disks, we normally help to develop a workflow within that organisation that enables them to manage this work for themselves using more ubiquitous 3.5-inch USB disk drives. With 5.25-inch disks it is more difficult to find suitable floppy disk drive controllers so we try our best at Archives to do this work on behalf of colleagues using equipment we’ve set up using the KryoFlux Universal USB floppy disk controller. The device enables the write-blocked reading, and imaging of legacy disk formats at a forensic level, using modern PC equipment.

We create disk images of the floppies using the KryoFlux and continue to use those images as a master copy for further triage. A rough outline of the process we tend to follow, plus some of its rationale is documented by Euan Cochran in his Open Planets Foundation blog: Bulk disk imaging and disk-format identification with KryoFlux.

Through a small amount of trial and error we discovered that the image format with which we were capable of reading the most sectors without error was MFM (Modified Frequency Modulation / Magnetic Force Microscopy) with the following settings:

Image Type: MFM Sector Image Start Track: At least 0 End Track: At most 83 Side Mode: Side 0 Sector Size: 256 Bytes Sector Count: Any Track Distance: 40 Tracks Target RPM: By Image type Flippy Mode: Off

We didn’t experiment to see if these settings could be further optimised as we found a good result. The non-default settings in the case of these disks were side mode zero, sector size 256 bytes, track distance at 40, and flippy mode was turned off.

Taken away from volatile and unstable media, we have binary objects that we can now attach fixity to, and treat using more common digital preservation workflows. We managed to read 30 out of 32 disks.

Exploding the Disk Images

With the disk images in hand we found ourselves facing our biggest challenge. The images, although clearly well-formed, i.e. not corrupt, would not mount with Virtual Floppy Disk or mount in Linux.

Successful imaging alone doesn’t guarantee ease of mounting. We still needed to understand the underlying file system.

The images that we’ve seen before have been FAT12 and mount with ease in MS-DOS or Linux. These disks did not share the same identifying signatures at the beginning of the bitstream. We needed a little help in identifying them and fortunately through forensic investigation, and a little experience demonstrated by a colleague, it was quite clear the disks were CP/M formatted; the following ASCII text appearing as-is in the bitstream:

 

************************* * MIC-501 V1.6 * * 62K CP/M VERS 2.2 * ************************* COPYRIGHT 1983, MULTITECH BIOS VERS 1.6

 

CP/M (Control Program for Microcomputers) is a 1970’s early 1980’s operating system for early Intel microcomputers. The makers of the operating system were approached by IBM about licensing CP/M for their Personal Computer product, but talks failed, and the IBM went with MS-DOS from Microsoft; the rest is ancient history…

With the knowledge that we were looking at a CP/M file system we were able to source a mechanism to mount the disks in Windows. Cpmtools is a privately maintained suite of utilities for interacting with CP/M file systems. It was developed for working with CP/M in emulated environments, but works with floppy disks, and disk images equally well. The tool is available in Windows and POSIX compliant systems.

Commands for the different utilities look like the following:

That resulted in a command line to generate a file listing like this:

Creating a directory listing:

C:> cpmls –f bw12 disk-images\disk-one.img

This will list the user number (a CP/M concept), and the files objects belonging to that user.

E.g.:

0: File1.txt File2.txt

Extracting objects based on user number:

C:> cpmcp -f bw12 -p -t disk-images\disk-one.img 0:* output-dir

This will extract all objects collected logically under user 0: and put them into an output directory.

Finding the right commands was a little tricky at first, but once the correct set of arguments were found, it was straightforward to keep repeating them for each of the disks.

One of the less intuitive values supplied to the command line was the ‘bw12’ disk definition. This refers to a definition file, detailing the layout of the disk. The definition that worked best for our disks was the following:

# Bondwell 12 and 14 disk images in IMD raw binary format diskdef bw12 seclen 256 tracks 40 sectrk 18 blocksize 2048 maxdir 64 skew 1 boottrk 2 os 2.2 end

The majority of the disks extracted well. A small, on-image modification we made was the conversion of filenames containing forward slashes. The forward slashes did not play well with Windows, and so I took the decision to change the slashes to hashes in hex to ensure the objects were safely extracted into the output directory.

WordStar and other bits ‘n’ pieces

Content on the disks was primarily WordStar – CP/M’s flavour of word processor. Despite MS-DOS versions of WordStar; almost in parallel with the demise of CP/M, the program eventually lost market share in the 1980’s to WordPerfect. It took a little searching to source a converter to turn the WordStar content into something more useful but we did find something fairly quickly. The biggest issue viewing WordStar content as-is, in a standard text editor is the format’s use of the high-order bits within individual bytes to define word boundaries, as well as being used to make other denotations.

Example text, read verbatim might look like:

thå  southerî coasô = the southern coast

At first, I was sure this was a sign of bit-flipping on less stable media. Again, the experience colleagues had with older formats was useful here, and a consultation with Google soon helped me to understand what we were seeing.

Looking for various readers or migration tools led me to a number of dead websites, but with the Internet Archive coming to the rescue to allow us to see them: WordStar to other format solutions.

The tool we ended up using was the HABit WorsStar Converter, with more information on Softpedia.com. It does bulk conversion of WordStar to plain text or HTML. We didn’t have to worry too much about how faithful the representation would be, as this was just a triage, we were more interested in the intellectual value of the content, or data. Rudimentary preservation of layout would be enough. We we’re very happy with plain text output with the option of HTML output too.

Unfortunately, when we approached Henry Bartlett, the developer of the tool, about a small bug in the bulk conversion where the tool neutralises file format extensions on output of the text file, causing naming collisions; we were informed by his wife that he’d sadly passed away. I hoped it would prove to be some reassurance to her to know that at the very least his work was still of great use for a good number of people doing format research, and for those who will eventually consume the objects that we’re working on.

Conversion was still a little more manual than we’d like if we had larger numbers of files, but everything ran smoothly. Each of the deliverables were collected, and taken back to the parent department on a USB stick along with the original 3.25-inch disks.

We await further news from DoC about what they’re planning on doing with the extracts next.

Conclusions

The research to complete this work took a couple of weeks overall. With more dedicated time it might have taken a week.

On completion, and delivery to The Department of Conservation, we’ve since run through the same process on another number of disks. This took a fraction of the time – possibly an afternoon. The process can be refined each further iteration.

The next step is to understand the value in what was extracted. This might mean using the extract to source printed copies of the content and understanding that we can dispose of these disks and their content. An even better result might be discovering that there are no other copies of the material and these digital objects can become records in their own right with potential for long term retention. At the very least those conversations can now begin. In the latter instance, we’ll need to understand what out of the various deliverables, i.e. the disk images; the extracted objects; and the migrated objects, will be considered the record.

Demonstrable value acts like a weight on the scales of digital preservation where we try and balance effort with value; especially in this instance, where the purpose of the digital material is yet unknown. This case study is borne from an air-gap in the recordkeeping process that sees the parent department attempting to understand the information in its possession in lieu of other recordkeeping metadata.

Aside from the value in what was extracted, there is still a benefit to us as an archive, and as a team in working with old technology, and equipment. Knowledge gained here will likely prove useful somewhere else down the line. 

Identifying the file system could have been a little easier, and so we’d echo the call from Euan in the aforementioned blog post to have identification mechanisms for image formats in DROID-like tools.

Forensic analysis of the disk images and comparing that data to that extracted by CP/M Tools showed a certain amount of data remanence, that is, data that only exists forensically on the disk. It was extremely tempting to do more work with this, but we settled for notifying our contact at DoC, and thus far, we haven’t been called on to extract it.

We required a number of tools to perform this work. How we maintain the knowledge of this work, and maintain the tools used are two important questions. I haven’t an answer for the latter, while this blog serves in some way as documentation of the former.

What we received from DoC was old, but it wasn’t a problem that it was old. The right tools enabled this work to be done fairly easily – that goes for any organisation willing to put modest tools in the arms of their analysts and researchers such as the KryoFlux, and other legacy equipment. The disks were in good shape too. The curveball in this instance was that some of the pieces of the puzzle that we were interacting with were weirder than expected; a slightly different file system, and a word processing format that encoded data in an unexpected way making 1:1 extract and use a little more difficult. We got around it though. And indeed, as it stands, this wasn’t a preservation exercise; it was a low-cost and pragmatic exercise to support appraisal, continuity, and potential future preservation. The files have been delivered to DoC in its various forms: disk images; extracted objects; and migrated objects. We’ll await a further nod from them to understand where we go next. 

Preservation Topics: Preservation ActionsIdentificationMigrationPreservation RisksTools
Categories: Planet DigiPres

18 Years of Kairos Webtexts: An interview with Douglas Eyman & Cheryl E. Ball

The Signal: Digital Preservation - 22 September 2014 - 2:05pm
Cheryl E. Ball

Cheryl E. Ball, associate professor of digital publishing studies at West Virginia University, is editor of Kairos

Since 1996 the electronic journal Kairos has published a diverse range of webtexts, scholarly pieces made up of a range of media and hypermedia. The 18 years of digital journal texts are both interesting in their own right and as a collection of complex works of digital scholarship that illustrate a range of sophisticated issues for ensuring long-term access to new modes of publication. Douglas Eyman, Associate Professor of Writing and Rhetoric at George Mason University is senior editor and publisher of Kairos. Cheryl E. Ball, associate professor of digital publishing studies at West Virginia University, is editor of Kairos. In this Insights Interview, I am excited to learn about the kinds of issues that this body of work exposes for considering long-term access to born-digital modes of scholarship. [There was also a presentation on Kairos at the Digital Preservation 2014 meeting.]

Trevor: Could you describe Kairos a bit for folks who aren’t familiar with it? In particular, could you tell us a bit about what webtexts are and how the journal functions and operates?

Doug: Webtexts are texts that are designed to take advantage of the web-as-concept, web-as-medium, and web-as-platform. Webtexts should engage a range of media and modes and the design choices made by the webtext author or authors should be an integral part of the overall argument being presented. One of our goals (that we’ve met with some success I think) is to publish works that can’t be printed out — that is, we don’t accept traditional print-oriented articles and we don’t post PDFs. We publish scholarly webtexts that address theoretical, methodological or pedagogical issues which surface at the intersections of rhetoric and technology, with a strong interest in the teaching of writing and rhetoric in digital venues.

dougbooks2

Douglas Eyman, Associate Professor of Writing and Rhetoric at George Mason University is senior editor and publisher of Kairos

(As an aside, there was a debate in 1997-98 about whether we were publishing hypertexts, which then tended to be available in proprietary formats and platforms and not freely available on the WWW or not; founding editor Mick Doherty argued that we were publishing much more than only hypertexts, so we moved from calling what we published ‘hypertexts’ to ‘webtexts’ — Mick tells that story in the 3.1 loggingon column).

Cheryl: WDS (What Doug said ;) One of the ways I explain webtexts to potential authors and administrators is that the design of a webtext should, ideally, enact authors’ scholarly arguments, so that the form and content of the work are inseparable.

Doug: The journal was started by an intrepid group of graduate students, and we’ve kept a fairly DIY approach since that first issue appeared on New Year’s day in 1996. All of our staff contribute their time and talents and help us to publish innovative work in return for professional/field recognition, so we are able to sustain a complex venture with a fairly unique economic model where the journal neither takes in nor spends any funds. We also don’t belong to any parent organization or institution, and this allows us to be flexible in terms of how the editors choose to shape what the journal is and what it does.

Cheryl: We are lucky to have a dedicated staff who are scattered across (mostly) the US: teacher-scholars who want to volunteer their time to work on the journal, and who implement the best practices of pedagogical models for writing studies into their editorial work. At any given time, we have about 25 people on staff (not counting the editorial board).

Doug: Operationally, the journal functions much like any other peer-reviewed scholarly journal: we accept submissions, review them editorially, pass on the ones that are ready for review to our editorial board, engage the authors in a revision process (depending on the results of the peer-review) and then put each submission through an extensive and rigorous copy-, design-, and code-editing process before final publication. Unlike most other journals, our focus on the importance of design and our interest in publishing a stable and sustainable archive mean that we have to add those extra layers of support for design-editing and code review: our published webtexts need to be accessible, usable and conform to web standards.

Trevor: Could you point us to a few particularly exemplary works in the journal over time for readers to help wrap their heads around what these pieces look like? They could be pieces you think are particularly novel or interesting or challenging or that exemplify trends in the journal. Ideally, you could link to it, describe it and give us a sentence or two about what you find particularly significant about it.

Cheryl: Sure! We sponsor an award every year for Best Webtext, and that’s usually where we send people to find exemplars, such as the ones Doug lists below.

Doug: From our peer-reviewed sections, we point readers to the following webtexts (the first two are especially useful for their focus on the process of webtext authoring and editing):

Cheryl: From our editorially (internally) reviewed sections, here are a few other examples:

Trevor: Given the diverse range of kinds of things people might publish in a webtext, could you tell us a bit about the kinds of requirements you have enforced upfront to try and ensure that the works the journal publishes are likely to persist into the future? For instance, any issues that might come up from embedding material from other sites, or running various kinds of database-driven works or things that might depend on external connections to APIs and such.

Doug: We tend to discourage work that is in proprietary formats (although we have published our fair share of Flash-based webtexts) and we ask our authors to conform to web standards (XHTML or HTML5 now). We think it is critical to be able to archive any and all elements of a given webtext on our server, so even in cases where we’re embedding, for instance, a YouTube video, we have our own copy of that video and its associated transcript.

One of the issues we are wrestling with at the moment is how to improve our archival processes so we don’t rely on third-party sites. We don’t have a streaming video server, so we use YouTube now, but we are looking at other options because YouTube allows large corporations to apply bogus copyright-holder notices to any video they like, regardless of whether there is any infringing content (as an example, an interview with a senior scholar in our field was flagged and taken down by a record company; there wasn’t even any background audio that could account for the notice. And since there’s a presumption of guilt, we have to go through an arduous process to get our videos reinstated.) What’s worse is when the video *isn’t* taken down, but the claimant instead throws ads on top of our authors’ works. That’s actually copyright infringement against us that is supported by YouTube itself.

Another issue is that many of the external links in works we’ve published (particularly in older webtexts) tend to migrate or disappear. We used to replace these where we can with links to archive.org (aka The Wayback Machine), but we’ve discovered that their archive is corrupted because they allow anyone to remove content from their archive without reason or notice.[1] So, despite its good intentions, it has become completely unstable as a reliable archive. But we don’t, alas, have the resources to host copies of everything that is linked to in our own archives.

Cheryl: Kairos holds the honor within rhetoric and composition of being the longest-running, and most stable, online journal, and our archival and technical policies are a major reason for that. (It should be noted that many potential authors have told us how scary those guidelines look. We are currently rewriting the guidelines to make them more approachable while balancing the need to educate authors on their necessity for scholarly knowledge-making and -preservation on the Web.)

Of course, being that this field is grounded in digital technology, not being able to use some of that technology in a webtext can be a rather large constraint. But our authors are ingenious and industrious. For example, Deborah Balzhiser et al created an HTML-based interface to their webtext that mimicked Facebook’s interface for their 2011 webtext, “The Facebook Papers.” Their self-made interface allowed them to do some rhetorical work in the webtext that Facebook itself wouldn’t have allowed. Plus, it meant we could archive the whole thing on the Kairos server in perpetuity.

Trevor: Could you give us a sense of the scope of the files that make up the issues? For instance, the total number of files, the range of file types you have, the total size of the data, and or a breakdown of the various kinds of file types (image, moving image, recorded sound, text, etc.) that exist in the run of the journal thus far?

Doug: The whole journal is currently around 20 Gb — newer issues are larger in terms of data size because there has be an increase in the use of audio and video (luckily, html and css files don’t take up a whole lot of room, even with a lot of content in them). At last count, there are 50,636 files residing in 4,545 directories (this count includes things like all the system files for WordPress installs and so on). A quick summary of primary file types:

  • HTML files:     12247
  • CSS:               1234
  • JPG files:        5581
  • PNG:               3470
  • GIF:                 7475
  • MP2/3/4:         295
  • MOV               237
  • PDF:                191

Cheryl: In fact, our presentation at Digital Preservation 2014 this year [was] partly about the various file types we have. A few years ago, we embarked on a metadata-mining project for the back issues of Kairos. Some of the fields we mined for included Dublin Core standards such as MIMEtype and DCMIType. DCMIType, for the most part, didn’t reveal too much of interest from our perspective (although I am sure librarians will see it differently!! :) but the MIMEtype search revealed both the range of filetypes we had published and how that range has changed over the journal’s 20-year history. Every webtext has at least one HTML file. Early webtexts (from 1996-2000ish) that have images generally have GIFs and, less prominent, JPEGs. But since PNGs rose to prominence (becoming an international standard in 2003), we began to see more and more of them. The same with CSS files around 2006, after web-standards groups starting enforcing their use elsewhere on the Web. As we have all this rich data about the history of webtextual design, and too many research questions to cover in our lifetimes, we’ve released the data in Dropbox (until we get our field-specific data repository, rhetoric.io, completed).

Trevor: In the 18 years that have transpired since the first issue of Kairos a lot has changed in terms of web standards and functionality. I would be curious to know if you have found any issues with how earlier works render in contemporary web browsers. If so, what is your approach to dealing with that kind of degradation over time?

Cheryl: If we find something broken, we try to fix it as soon as we can. There are lots of 404s to external links that we will never have the time or human resources to fix (anyone want to volunteer??), but if an author or reader notifies us about a problem, we will work with them to correct the glitch. One of the things we seem to fix often is repeating backgrounds. lol. “Back in the days…” when desktop monitors were tiny and resolutions were tinier, it was inconceivable that a background set to repeat at 1200 pixels would ever actually repeat. Now? Ugh.

But we do not change designs for the sake of newer aesthetics. In that respect, the design of a white-text-on-black-background from 1998 is as important a rhetorical point as the author’s words in 1998. And, just as the ideas in our scholarship grow and mature as we do, so do our designs, which have to be read in the historical context of the surrounding scholarship.

Of course, with the bettering of technology also comes our own human degradation in the form of aging and poorer eyesight. We used to mandate webtexts not be designed over 600 pixels wide, to accommodate our old branding system that ran as a 60-pixel frame down the left-hand side of all the webtexts. That would also allow for a little margin around the webtext. Now, designing for specific widths — especially ones that small — seems ludicrous (and too prescriptive), but I often find myself going into authors’ webtexts during the design-editing stage and increasing their typeface size in the CSS so that I can even read it on my laptop. There’s a balance I face, as editor, of retaining the authors’ “voice” through their design and making the webtext accessible to as many readers as possible. Honestly, I don’t think the authors even notice this change.

Trevor: I understand you recently migrated the journal from a custom platform to the Open Journal System platform. Could you tell us a bit about what motivated that move and issues that occurred in that migration?

Doug: Actually, we didn’t do that.

Cheryl: Yeah, I know it sounds like we did from our Digital Preservation 2014 abstract, and we started to migrate, but ended up not following through for technical reasons. We were hoping we could create plug-ins for OJS that would allow us to incorporate our multimedia content into its editorial workflow. But it didn’t work. (Or, at least, wasn’t possible with the $50,000 NEH Digital Humanities Start-Up Grant we had to work with.) We wanted to use OJS to help streamline and automate our editorial workflow–you know, the parts about assigning reviewers and copy-editors, etc., — and as a way to archive those processes.

I should step back here and say that Kairos has never used a CMS; everything we do, we do by hand — manually SFTPing files to the server, manually making copies of webtext folders in our kludgy way of version control, using YahooGroups (because it was the only thing going in 1998 when we needed a mail system to archive all of our collaborative editorial board discussions) for all staff and reviewer conversations, etc.–not because we like being old school, but because there were always too many significant shortcomings with any out-of-the-box systems given our outside-the-box journal. So the idea of automating, and archiving, some of these processes in a centralized database such as OJS was incredibly appealing. The problem is that OJS simply can’t handle the kinds of multimedia content we publish. And rewriting the code-base to accommodate any plug-ins that might support this work was not in the budget. (We’ve written about this failed experiment in a white paper for NEH.)

[1] Archive.org will obey robots.txt files if they ask not to be indexed. So, for instance, early versions of Kairos itself are no longer available on archive.org because such a file is on the Texas Tech server where the journal lived until 2004. We put that file there because we want Google to point to the current home of the journal, but we actually would like that history to be in the Internet Archive. You can think of this as just a glitch, but here’s the more pressing issue: if I find someone has posted a critical blog post of my work, if I ever get ahold of the domain it was originally posted, I can take it down there *and* retroactively make it unavailable on archive.org, even if it used to show up there. Even without such nefarious purpose, just the constant trade in domains and site locations means that no researcher can trust that archive when using it for history or any kind of digital scholarship.

Categories: Planet DigiPres

How trustworthy is the SCAPE Preservation Environment?

SCAPE Blog Posts - 19 September 2014 - 1:51pm

Over the last three and a half years, the SCAPE project worked in several directions in order to propose new solutions for digital preservation, as well as improving existing ones. One of the results of this work is the SCAPE preservation environment (SPE). It is a loosely coupled system, which enables extending existing digital repository systems (e.g. RODA) with several components that cover collection profiling (i.e. C3PO), preservation monitoring (i.e. SCOUT) and preservation planning (i.e. Plato). Those components address key functionalities defined in the Open Archival Information System (OAIS) functional model.

Establishing trustworthiness of digital repositories is a major concern of the digital preservation community as it makes the threats and risks within a digital repository understandable. There are several approaches developed over recent years on how to address trust in digital repositories. The most notable is Trustworthy Repositories Audit and Certification (TRAC), which has later been promoted to an ISO standard by the International Standards Organization (ISO 16363, released in 2012). The standard comprises of three pillars: organizational infrastructure, digital object management, and infrastructure and security management and for each of these it provides a set of requirements and the expected evidence needed for compliance.

A recently published whitepaper reports on the work done to validate the SCAPE Preservation Environment against the ISO 16363 – a framework for Audit and Certification of Trustworthy Digital Repositories. The work aims to demonstrate that a preservation ecosystem composed of building blocks as the ones developed in SCAPE is able to comply with most of the system-related requirements of the ISO 16363.

From a total of 108 metrics included in the assessment, the SPE fully supports 69 of them. 31 metrics were considered to be “out of scope” as they refer to organisational issues that cannot be solved by technology alone nor can they be analysed outside the framework of a breathing organisation, leaving 2 metrics to be considered “partially supported” and 6 metrics to be considered “not supported”. This gives an overall compliancy level of roughly 90% (if the organisational oriented metrics are not taken into account).

This work also enabled us to identify the main weak points of the SCAPE Preservation Environment that should be addressed in the near future. In summary the gaps found were:

  • The ability to manage and maintain contracts or deposit agreements through the repository user interfaces;
  • Support for tracking intellectual property rights;
  • Improve technical documentation, especially on the conversion of Submission Information Packages (SIP) into Archival Information Packages (AIP);
  • The ability to aid the repository manager to perform better risk management.

Our goal is to ensure that the SCAPE Preservation Environment fully supports the system-related metrics of the ISO 16363. In order to close the gaps encountered, additional features have been added to the roadmap of the SPE.

To get your hands on the full report, please go to http://www.scape-project.eu/wp-content/uploads/2014/09/SCAPE_MS63_KEEPS-V1.0.pdf

 

Preservation Topics: Preservation StrategiesPreservation RisksSCAPE
Categories: SCAPE

How trustworthy is the SCAPE Preservation Environment?

Open Planets Foundation Blogs - 19 September 2014 - 1:51pm

Over the last three and a half years, the SCAPE project worked in several directions in order to propose new solutions for digital preservation, as well as improving existing ones. One of the results of this work is the SCAPE preservation environment (SPE). It is a loosely coupled system, which enables extending existing digital repository systems (e.g. RODA) with several components that cover collection profiling (i.e. C3PO), preservation monitoring (i.e. SCOUT) and preservation planning (i.e. Plato). Those components address key functionalities defined in the Open Archival Information System (OAIS) functional model.

Establishing trustworthiness of digital repositories is a major concern of the digital preservation community as it makes the threats and risks within a digital repository understandable. There are several approaches developed over recent years on how to address trust in digital repositories. The most notable is Trustworthy Repositories Audit and Certification (TRAC), which has later been promoted to an ISO standard by the International Standards Organization (ISO 16363, released in 2012). The standard comprises of three pillars: organizational infrastructure, digital object management, and infrastructure and security management and for each of these it provides a set of requirements and the expected evidence needed for compliance.

A recently published whitepaper reports on the work done to validate the SCAPE Preservation Environment against the ISO 16363 – a framework for Audit and Certification of Trustworthy Digital Repositories. The work aims to demonstrate that a preservation ecosystem composed of building blocks as the ones developed in SCAPE is able to comply with most of the system-related requirements of the ISO 16363.

From a total of 108 metrics included in the assessment, the SPE fully supports 69 of them. 31 metrics were considered to be “out of scope” as they refer to organisational issues that cannot be solved by technology alone nor can they be analysed outside the framework of a breathing organisation, leaving 2 metrics to be considered “partially supported” and 6 metrics to be considered “not supported”. This gives an overall compliancy level of roughly 90% (if the organisational oriented metrics are not taken into account).

This work also enabled us to identify the main weak points of the SCAPE Preservation Environment that should be addressed in the near future. In summary the gaps found were:

  • The ability to manage and maintain contracts or deposit agreements through the repository user interfaces;
  • Support for tracking intellectual property rights;
  • Improve technical documentation, especially on the conversion of Submission Information Packages (SIP) into Archival Information Packages (AIP);
  • The ability to aid the repository manager to perform better risk management.

Our goal is to ensure that the SCAPE Preservation Environment fully supports the system-related metrics of the ISO 16363. In order to close the gaps encountered, additional features have been added to the roadmap of the SPE.

To get your hands on the full report, please go to http://www.scape-project.eu/wp-content/uploads/2014/09/SCAPE_MS63_KEEPS-V1.0.pdf

 

Preservation Topics: Preservation StrategiesPreservation RisksSCAPE
Categories: Planet DigiPres

After the Apple Launch, What's Next for U2? - Forbes

Google News Search: "new file format" - 19 September 2014 - 1:23pm

Forbes

After the Apple Launch, What's Next for U2?
Forbes
In a recent TIME Magazine piece, the band vaguely discussed what their intention to revolutionize music with a new file format, one which “will prove so irresistibly exciting to music fans that it will tempt them again into buying music—whole albums ...

Categories: Technology Watch

U2 working with Apple to create new music format to combat piracy - Hollywood.com

Google News Search: "new file format" - 19 September 2014 - 1:14pm

U2 working with Apple to create new music format to combat piracy
Hollywood.com
In a new TIME magazine article, the singer has detailed the group's plans to help combat the illegal downloading of artists' music by creating a new file format which cannot be copied. The aim of the top secret project is to tempt fans to purchase full ...

and more »
Categories: Technology Watch

Emerging Collaborations for Accessing and Preserving Email

The Signal: Digital Preservation - 19 September 2014 - 1:02pm

The following is a guest post by Chris Prom, Assistant University Archivist and Professor, University of Illinois at Urbana-Champaign.

I’ll never forget one lesson from my historical methods class at Marquette University.  Ronald Zupko–famous for his lecture about the bubonic plague and a natural showman–was expounding on what it means to interrogate primary sources–to cast a skeptical eye on every source, to see each one as a mere thread of evidence in a larger story, and to remember that every event can, and must, tell many different stories.

He asked us to name a few documentary genres, along with our opinions as to their relative value.  We shot back: “Photographs, diaries, reports, scrapbooks, newspaper articles,” along with the type of ill-informed comments graduate students are prone to make.  As our class rattled off responses, we gradually came to realize that each document reflected the particular viewpoint of its creator–and that the information a source conveyed was constrained by documentary conventions and other social factors inherent to the medium underlying the expression. Settling into the comfortable role of skeptics, we noted the biases each format reflected.  Finally, one student said: “What about correspondence?”  Dr Zupko erupted: “There is the real meat of history!  But, you need to be careful!”

problemInbox

Dangerous Inbox by Recrea HQ. Photo courtesy of Flickr through a CC BY-NC-SA 2.0 license.

Letters, memos, telegrams, postcards: such items have long been the stock-in-trade for archives.  Historians and researchers of all types, while mindful of the challenges in using correspondence, value it as a source for the insider perspective it provides on real-time events.   For this reason, the library and archives community must find effective ways to identify, preserve and provide access to email and other forms of electronic correspondence.

After I researched and wrote a guide to email preservation (pdf) for the Digital Preservation Coalition’s Technology Watch Report series, I concluded that the challenges are mostly cultural and administrative.

I have no doubt that with the right tools, archivists could do what we do best: build the relationships that underlie every successful archival acquisition.  Engaging records creators and donors in their digital spaces, we can help them preserve access to the records that are so sorely needed for those who will write histories.  But we need the tools, and a plan for how to use them.  Otherwise, our promises are mere words.

For this reason, I’m so pleased to report on the results of a recent online meeting organized by the National Digital Stewardship Alliance’s Standards and Practices Working Group.  On August 25, a group of fifty-plus experts from more than a dozen institutions informally shared the work they are doing to preserve email.

For me, the best part of the meeting was that it represented the diverse range of institutions (in terms of size and institutional focus) that are interested in this critical work. Email preservation is not something of interest only to large government archives,or to small collecting repositories, but also to every repository in between. That said, the representatives displayed a surprising similar vision for how email preservation can be made effective.

Robert Spangler, Lisa Haralampus, Ken  Hawkins and Kevin DeVorsey described challenges that the National Archives and Records Administration has faced in controlling and providing access to large bodies of email. Concluding that traditional records management practices are not sufficient to task, NARA has developed the Capstone approach, seeking to identify and preserve particular accounts that must be preserved as a record series, and is currently revising its transfer guidance.  Later in the meeting, Mark Conrad described the particular challenge of preserving email from the Executive Office of the President, highlighting the point that “scale matters”–a theme that resonated across the board.

The whole account approach that NARA advocates meshes well with activities described by other presenters.  For example, Kelly Eubank from North Carolina State Archives and the EMCAP project discussed the need for software tools to ingest and process email records while Linda Reib from the Arizona State Library noted that the PeDALS Project is seeking to continue their work, focusing on account-level preservation of key state government accounts.

Functional comparison of selected email archives tools/services. Courtesy Wendy Gogel.

Functional comparison of selected email archives tools/services. Courtesy Wendy Gogel.

Ricc Ferrante and Lynda Schmitz Fuhrig from the Smithsonian Institution Archives discussed the CERP project which produced, in conjunction with the EMCAP project, an XML schema for email objects among its deliverables. Kate Murray from the Library of Congress reviewed the new email and related calendaring formats on the Sustainability of Digital Formats website.

Harvard University was up next.  Andrea Goethels and Wendy Gogel shared information about Harvard’s Electronic Archiving Service.  EAS includes tools for normalizing email from an account into EML format (conforming to the Internet Engineering Task Force RFC 2822), then packaging it for deposit into Harvard’s digital repository.

One of the most exciting presentations was provided by Peter Chan and Glynn Edwards from Stanford University.  With generous funding from the National Historical Publications and Records Commission, as well as some internal support, the ePADD Project (“Email: Process, Appraise, Discover, Deliver”) is using natural language processing and entity extraction tools to build an application that will allow archivists and records creators to review email, then process it for search, display and retrieval.  Best of all, the web-based application will include a built-in discovery interface and users will be able to define a lexicon and to provide visual representations of the results.  Many participants in the meeting commented that the ePADD tools may provided a meaningful focus for additional collaborations.  A beta version is due out next spring.

In the discussion that followed the informal presentations, several presenters congratulated the Harvard team on a slide Wendy Gogel shared, comparing the functions provided by various tools and services (reproduced above).

As is apparent from even a cursory glance at the chart, repositories are doing wonderful work—and much yet remains.

Collaboration is the way forward. At the end of the discussion, participants agreed to take three specific steps to drive email preservation initiatives to the next level: (1) providing tool demo sessions; (2) developing use cases; and (3) working together.

The bottom line: I’m more hopeful about the ability of the digital preservation community to develop an effective approach toward email preservation than I have been in years.  Stay tuned for future developments!

Categories: Planet DigiPres

U2 working with Apple to create new music format to combat piracy - Express.co.uk

Google News Search: "new file format" - 19 September 2014 - 8:55am

Express.co.uk

U2 working with Apple to create new music format to combat piracy
Express.co.uk
In a new TIME magazine article, the singer has detailed the group's plans to help combat the illegal downloading of artists' music by creating a new file format which cannot be copied. The aim of the top secret project is to tempt fans to purchase full ...
Bono: U2 free album delivery was 'punk rock', now I'll save the music industrySydney Morning Herald

all 48 news articles »
Categories: Technology Watch

Time - U2 Working With Apple To Create New Music Format To Combat Piracy - Contactmusic.com

Google News Search: "new file format" - 19 September 2014 - 7:02am

AceShowbiz

Time - U2 Working With Apple To Create New Music Format To Combat Piracy
Contactmusic.com
In a new Time magazine article, the singer has detailed the group's plans to help combat the illegal downloading of artists' music by creating a new file format which cannot be copied. The aim of the top secret project is to tempt fans to purchase full ...
Bono: U2 free album delivery was 'punk rock', now I'll save the music industrySydney Morning Herald

all 38 news articles »
Categories: Technology Watch

Digital Preservation Sustainability on the EU Policy Level

OPF Wiki Activity Feed - 19 September 2014 - 6:53am

Page edited by Jette Junge

View Online | Add Comment Jette Junge 2014-09-19T06:53:53Z

Digital Preservation Sustainability on the EU Policy Level

SCAPE Wiki Activity Feed - 19 September 2014 - 6:53am

Page edited by Jette Junge

View Online | Add Comment Jette Junge 2014-09-19T06:53:53Z
Categories: SCAPE

Bono: U2 free album delivery was 'punk rock', now I'll save the music industry - Sydney Morning Herald

Google News Search: "new file format" - 19 September 2014 - 1:30am

Sydney Morning Herald

Bono: U2 free album delivery was 'punk rock', now I'll save the music industry
Sydney Morning Herald
Now Bono has told Time magazine that he's working with Apple on an even bigger project: a new file format that could save the music industry. The band's 14th album will be delivered on the new format. He told Time: "[it will be] an audiovisual ...

and more »
Categories: Technology Watch

Apple, U2 Working On New Music File Format - PropertyOfZack

Google News Search: "new file format" - 18 September 2014 - 9:59pm

Music Times

Apple, U2 Working On New Music File Format
PropertyOfZack
In Time's forthcoming cover story, Bono hints that the band's next record is “about 18 months away” and will be released under the new file format. “I think it's going to get very exciting for the music business,” Bono tells Time, “[it will be] an ...
U2, Apple Working on New Music File Format to Combat Piracy, Create an ...Music Times
U2 And Apple Reveal Their Next Surprise TogetherFDRMX

all 336 news articles »
Categories: Technology Watch

U2, Apple Working on New Music File Format to Combat Piracy, Create an ... - Music Times

Google News Search: "new file format" - 18 September 2014 - 3:37pm

Music Times

U2, Apple Working on New Music File Format to Combat Piracy, Create an ...
Music Times
According to a forthcoming TIME cover story, U2 and Apple are working on a new file format that will be too enticing for music fans to ignore. Bono told the magazine that "he hopes that a new digital music format in the works will prove so irresistibly ...
Apple, U2 Working On New Music File FormatPropertyOfZack
U2 And Apple Reveal Their Next Surprise TogetherFDRMX

all 336 news articles »
Categories: Technology Watch

The return of music DRM?

File Formats Blog - 18 September 2014 - 12:58pm

U2, already the most hated band in the world thanks to its invading millions of iOS devices with unsolicited files, isn’t stopping. An article on Time‘s website tells us, in vague terms, that

Bono, Edge, Adam Clayton and Larry Mullen Jr believe so strongly that artists should be compensated for their work that they have embarked on a secret project with Apple to try to make that happen, no easy task when free-to-access music is everywhere (no) thanks to piracy and legitimate websites such as YouTube. Bono tells TIME he hopes that a new digital music format in the works will prove so irresistibly exciting to music fans that it will tempt them again into buying music—whole albums as well as individual tracks.

It’s hard to read this as anything but an attempt to bring digital rights management (DRM) back to online music distribution. Users emphatically rejected it years ago, and Apple was among the first to drop it. You haven’t really “bought” anything with DRM on it; you’ve merely leased it for as long as the vendor chooses to support it. People will continue to break DRM, if only to avoid the risk of loss. The illegal copies will offer greater value than legal ones.

It would be nice to think that what U2 and Apple really mean is just that the new format will offer so much better quality that people will gladly pay for it, but that’s unlikely. Higher-quality formats such as AAC have been around for a long time, and they haven’t pushed the old standby MP3 out of the picture. Existing levels of quality are good enough for most buyers, and vendors know it.

Time implies that YouTube doesn’t compensate artists for their work. This is false. They often don’t bother with small independent musicians, though they will if they’re reminded hard enough (as Heather Dale found out), but it’s hard to believe that groups with powerful lawyers, such as U2, aren’t being compensated for every view.

DRM and force-feeding of albums are two sides of the same coin of vendor control over our choices. This new move shouldn’t be a surprise.


Tagged: Apple, audio, DRM
Categories: Planet DigiPres

Report: Apple and U2 to Debut New Music File Format - Billboard

Google News Search: "new file format" - 18 September 2014 - 12:48pm

Billboard

Report: Apple and U2 to Debut New Music File Format
Billboard
With consumer behavior, and the record industry with it, predicted to pivot so dramatically towards access over ownership in the next five years, it's questionable whether a new file format, even a low-size, high-quality (and possibly locked down with ...

and more »
Categories: Technology Watch