Web pages are getting more complex than ever. Thus, identifying different elements from web pages, such as main content, menus, user comments, advertising among others, becomes difficult. Web page segmentation refers to the process of dividing a Web page into visually and semantically coherent segments called Blocks or Segments.
First things first. The Github repository with the Audio QA workflows is here: https://github.com/statsbiblioteket/scape-audio-qa. And version 1 is working. Version is really all wrong here. I should call it Workflow 1, which is this one:
One of my first blogs here covered an evaluation of a number of format identification tools. One of the more surprising results of that work was that out of the five tools that were tested, no less than four of them (FITS, DROID, Fido and JHOVE2) failed to even run when executed with their associated launcher script. In many cases the Windows launcher scripts (batch files) only worked when executed from the installation folder. Apart from making things unnecessarily difficult for the user, this also completely flies in the face of all existing conventions on command-line interface design. Around the time of this work (summer 2011) I had been in contact with the developers of all the evaluated tools, and until last week I thought those issues were a thing of the past. Well, was I wrong!