But you can't. Maybe that's why you're here. Transformenator can help.
It turns out that this is really, really useful when faced with files from ancient word processors, for example. They used all kinds of crazy binary annotations within a file (this is before the days of text markup, remember). With Transformenator, it's easy to swap out those binary annotations for HTML or RTF tags that all of a sudden make those ancient files readable again, maybe even with their original formatting and highlighting intact. Many samples come built into the Transformenator package that can make such file conversions easy.
The list of these samples is available in CVS at this link.
For example:
Where:
transform
is the name of a file containing the set of transformations you want to make
infile
is the original file
outfile
is the resulting file after making all transformations
infile
, represented in a typical hex editor:
Say we want to change the hex zeroes in the middle to spaces, and eliminate anything after the EOF (0x1a).
Create a transform file named transform
to do that:
So, we run infile
through the transform
file and send the output to a file named outfile
:
The resulting outfile
looks like this in a hex editor:
Observe that nulls became spaces, and other trailing stuff after the EOF character has been removed.
The instructions for making the various transformations you want are simply listed in a plain text file.
A transform
file might look like this, for example:
The left side of the equals sign, hash mark, or percent sign is the byte pattern to search for. To specify it, the following conventions are used:
There is a special case for specifying returns and newlines. If you need to translate a byte pattern into something that includes them, you need to escape the backslash. For example:
The table below lists the special processing capabilities of Transformenator.
Construct | Explanation | Example |
---|---|---|
Comment, will be ignored | ||
Replace the left side of the equals with what's on the right side and move past the replacement looking for the next match | ||
Replace the left side of the hash with what's on the right side and re-consider the just-replaced data for more matches | ||
Replace the left side of the percent with what's on the right side, alternating between two values separated by commas | ||
Shift a byte range byte-for-byte to a different range starting at the value after the equals sign, or remove them altogether | ||
Used together when a file contains a binary length of file vector. |
||
Specifies a prefix to add to the beginning of the resultant file | ||
Specifies a suffix to add to the end of the resultant file | ||
Specifies a number of bytes to trim off the beginning of the input file (in hex bytes) before any binary transforms are applied | Specifies a number of bytes to trim off the end of the input file (in hex bytes) before any binary transforms are applied. Results are undefined if used in conjunciton with eof_* vectors above |
|
Specifies a regular expression to run on the output after any and all binary transforms are applied; multiple regex expressions are allowed, one per line; the first character in the string serves as the delimiter | ||
Special command value signifying the start of file; when the value specified on left side of the equals is first found, everything up to that point will be discarded | ||
Special command value signifying the start of file; when the value specified on left side of the equals is found for the final time in a file, everything up to that point will be discarded | ||
Special command value signifying the end of file; everything found after that point will be discarded |
The Transformenator project comes with several example transforms used to convert various early binary word processor formats to RTF, HTML, or plain ASCII text. You can see the list of available "internal" transforms by invoking the transformenator jar with no parameters, like so:
The transformenator.jar includes several supporting utilities.
There is a batch file/shell script, transformutil.bat/.sh
, that simplifies the invocation of all of these utilities.
They are:
Utility: org.transformenator.util.(...) |
Parameters | Summary |
---|---|---|
TransformFile |
transform infile outfile |
Apply transform to a single file - the default operation of transformenator.jar |
TransformDirectory |
transform in_directory out_directory |
Apply transform to all files in a filesystem tree recursively |
TransformDirectory |
fix_filenames in_directory |
Convert dodgy chracters in filenames (unicode, etc.) to DOS-legal characters in a filesystem tree recursively |
CreateLwpMacro |
in_directory out_directory |
Creates a Lotus Word Pro macro (lotus2word.lss) to feed to that word processor to transform all files in a filesystem to Word .doc format |
DOSImage |
display infile (or) update infile outfile [force160|180|320|360|360a|1200] |
View or add the BIOS Parameter Block (BPB) to older DOS disk images to make them mountable - fixes those pesky "no mountable filesystems" errors, also removes the Stoned virus |
ExtractAdministrativeSystemFile |
infile outfile |
Extract file from a IBM 5520 Administrative System disk image |
ExtractCPTFiles |
infile [out_directory] |
Extract files from CPT word processor disk image |
ExtractCSV |
csv-transform infile outfile.csv |
Extract fixed-length records from a file as comma separated values (see csv-transform specification here) |
ExtractCTOSArchive |
infile [out_directory] |
Extract files from the tar-like archive in CTOS (incomplete) |
ExtractDisplaywriterFiles |
infile [out_directory] |
Extract files from Displaywriter word processor disk image |
ExtractEasyWriterFiles |
infile [out_directory] |
Extract EasyWriter word processing files from a 13-sector Apple II disk image |
ExtractHardFiles |
infile [out_directory] |
Extract files from an otherwise unknown word processor disk image from FC5025 hard-sector disk capture |
ExtractHPFiles |
infile [out_directory] |
Extract files from HP instrument (not LIF) disk image |
ExtractIBM8Files |
infile [out_directory] |
Extract files from IBM-formatted 8" disk image |
ExtractMagiFiles |
infile [out_directory] |
Extract files from Magi Major Leaguer 8" disk image |
ExtractMemoryWriterFiles |
infile [out_directory] |
Extract files from a disk image from an unknown word processor, possibly Xerox MemoryWriter |
ExtractOfficeSystem6Files |
infile [out_directory] |
Extract files from IBM Office System 6 disk images |
ExtractPanasonicFiles |
infile [out_directory] |
Extract files from Panasonic KX-* word processor disk images |
ExtractSeawellFiles |
infile [out_directory] |
Extract Seawell DOS files from 8" disk images |
ExtractSmithCoronaFiles |
infile [out_directory] |
Extract files from Smith-Corona typewriter disk images |
ExtractWangFiles |
infile [out_directory] |
Extract Wang OIS word processing files from a Wang disk image |
ExtractXerox860Files |
infile [out_directory] |
Extract files from Xerox 860 word processor disk image |
RevealValdocsEntries |
infile outfile |
"Un-hide" the CP/M directory entries of files that have a leading byte of 0x60 in a disk image - typical of Valdocs files on Epson QX-10 TPM-II disk images |
Each is invokable with a Java command like this:java -cp transformenator.jar org.transformenator.util.TransformDirectory transform in_directory out_directory