Identification tools, an evaluation
By blekinge on 23 February 2012 – 9:09amWe have created a testing framework based on the Govdocs1 digital Corpora (http://digitalcorpora.org/corpora/files), and are using the characterisation results from Forensic Innovations, Inc. ((http://www.forensicinnovations.com/), as ground truths.
We have tested Tika 1.0, Fido 0.9.6 and Droid 6.0 with the V45 signature file.
Tika generally performs best for all the 20 most common formats. Especially for text files (text/plain), it is the only tested tool that correctly identifies the files.
Tika is the fastests of the tools, and Fido is the slowest.

