SCAPE/OPF Continuous Integration update
As previously blogged about by Carl we now have virtually all SCAPE and OPF projects in Continuous Integration; building and unit testing in both Travis CI and Jenkins.
- Travis compiles the projects and executes unit tests whenever a new commit is pushed to Github, or when a pull request is submitted to the project.
- Jenkins builds are generally scheduled once per day. After a build the software has its code quality analysed by Sonar
Let’s benchmark our Hadoop clusters (join in!)
Introduction
For our evaluations within SCAPE it would be useful to have the ability to quantitatively measure the abilities of the Hadoop clusters available to us, to allow results from each cluster to be compared.
Fortunately as part of the standard Hadoop distribution there are some examples included that can be run as tests. Intel has produced a benchmarking suite – HiBench – that uses those included Hadoop examples to produce a set of results.
Dissimilar: an experimental Image Quality Assurance tool
An important part of image file format migration is quality assurance. Various tools can be used such as ImageMagick or Matchbox, but they only provide one metric or are for different use-cases. I wanted to investigate implementation of image comparison algorithms so began investigating.
Adventures in setting up access controls in a Fedora Commons repository
We have been evaluating the use of the latest Fedora Commons, version 3.6.2, as a test repository. Having followed the straightforward installation process we were left with a repository with one preconfigured user – fedoraAdmin.
There are two APIs – API-A for access and API-M for management. For our test instance API-A was configured on installation to require a log in, but it can be configured to require no log in. It appeared that whilst the REST API for API-A was restricted, the SOAP API for API-A was not, this was corrected by using the example policy, below. Investigations of how to configure multiple users are also detailed.
Validating a migrated image
Part of my work on the SCAPE testbeds involves producing a workflow for the large scale migration of TIFF to JP2 files, with validation. The tests I have run all involve the lossy compression of files.
Two tools that could be used for the validation of image payload, and therefore success of a migration, are Matchbox, developed for SCAPE by AIT, and ImageMagick’s “compare” tool. One of Matchbox’s tests gives a result of SSIM, a value between 0 and 1. The metric I chose to use from “compare” was PSNR, a value in decibels.
Mixing Hadoop and Taverna
As part of our work on test-beds for the SCAPE project we have been investigating the various ways in which a large scale file format migration workflow could be implemented. The underlying technologies chosen for the platform are Hadoop and Taverna. One of the aims of the SCAPE project is to allow the automatic generation and execution of Taverna workflows, which will be executed via Hadoop.
The four methods for implementing a file format migration workflow that we tested were:
- Batch execution of a shell script (no parallelisation)
- A workflow written in/controlled from Java, run on Hadoop
- A workflow written in/controlled from Taverna, run on Hadoop
- A workflow written in Taverna, calling an XML defined unit of execution in Hadoop