A Weekend With Nanite
Well over a year ago I wrote the ”A Year of FITS”(http://www.openplanetsfoundation.org/blogs/2013-01-09-year-fits) blog post describing how we, during the course of 15 months, characterised 400 million of harvested web documents using the File Information Tool Kit (FITS) from Harvard University. I presented the technique and the technical metadata and basically concluded that FITS didn’t fit that kind of heterogenic data in such large amounts. In the time that has passed since that experiment, FITS has been improved in several areas including the code base and organisation of the development and it could be interesting to see how far it has evolved for big data. Still, FITS is not what I will be writing on today.
Today I’ll present how we characterised more than 250 million web documents, not in 9 months, but during a weekend.
Standing on the Shoulders of Your Peers
Notes on the Hadoop Driven Digital Preservation Hackathon in Vienna 2012 — or — How I Learned to Grunt Magic Spells.
A Year of FITS
From November 2011 until November 2012 we at the State and University Library in Denmark have continually been running the FITS tool on harvested web resources. This blog post presents performance data on this job and how FITS performs when being fed nearly 12TB of web resources.
