The idea of MUPPET itself is not specifically a GUI tool, in the first place it is intended as an API, on top of which a GUI can be placed, if desired.So it is a two-way street: content-owners have a ‘pretty’ way to analyse sets of files without having to type complicated commands, and engineers…
I don’t agree so much with the use of file extensions, because they are too easy to change and I see too many name collisions. For example ‘.DOC’. There are many word processors using proprietary file formats that utilize ‘.DOC’. In my work, I do my best to avoid having to use a fall back. …
As part of the AQuA project, I started writing a basic Office binary-format analyser using Apache POI. The aim there was to identify the version of Office/Word/whatever that created…
As we discussed at iPres, it would be possible in principle to reveal these dependencies using my kernel tracing technique (click here for pre-print), but unfortunately, there is no good system call tracing tool for Windows, as far as I can tell….
Identifying “complex objects” (for want of a better term), that include multiple files that each require different rendering applications, does seem like it will lead to more complex view paths. However it should be feasible, in most cases, to use the…
One of Niklas’s first steps was to look into the libraries used by tools such as DROID. Your message confirmed that the Apache poi library was one of the best options for use in this project.
In Droid we used the Apache poi library to process office documents. We actually only used the poifs component (the part which reads ole2 files as a file system) which works very well. There are also more detailed apis to deal with more advanced concepts in office files.
For all who would like to see it first before trying themself: We had a demo session at the iPRES in Singapore during Session 4 – Posters and Demonstrations on 2nd November in the afternoon. The…
Does this suggest that a generic / homogenious file format record framework is the key?
If we can share library/registary content easily (through standardised description records) and derive a method of individualising/ uniquely indentify where contributions came from we’ve moved the game…
The idea of MUPPET itself is not specifically a GUI tool, in the first place it is intended as an API, on top of which a GUI can be placed, if desired.So it is a two-way street: content-owners have a ‘pretty’ way to analyse sets of files without having to type complicated commands, and engineers…
I don’t agree so much with the use of file extensions, because they are too easy to change and I see too many name collisions. For example ‘.DOC’. There are many word processors using proprietary file formats that utilize ‘.DOC’. In my work, I do my best to avoid having to use a fall back. …
As part of the AQuA project, I started writing a basic Office binary-format analyser using Apache POI. The aim there was to identify the version of Office/Word/whatever that created…
As we discussed at iPres, it would be possible in principle to reveal these dependencies using my kernel tracing technique (click here for pre-print), but unfortunately, there is no good system call tracing tool for Windows, as far as I can tell….
Thanks for the comments Dirk.
Identifying “complex objects” (for want of a better term), that include multiple files that each require different rendering applications, does seem like it will lead to more complex view paths. However it should be feasible, in most cases, to use the…
Thanks for this suggestion Matt.
One of Niklas’s first steps was to look into the libraries used by tools such as DROID. Your message confirmed that the Apache poi library was one of the best options for use in this project.
The findings of this investigation might directly influence the concept of view paths. View paths or pathways, depending on which literature you are…
In Droid we used the Apache poi library to process office documents. We actually only used the poifs component (the part which reads ole2 files as a file system) which works very well. There are also more detailed apis to deal with more advanced concepts in office files.
For all who would like to see it first before trying themself: We had a demo session at the iPRES in Singapore during Session 4 – Posters and Demonstrations on 2nd November in the afternoon. The…
Does this suggest that a generic / homogenious file format record framework is the key?
If we can share library/registary content easily (through standardised description records) and derive a method of individualising/ uniquely indentify where contributions came from we’ve moved the game…
Pages