Transformenator (Download)

Transformenation is something that should be possible to do with some rudimentary shell scripting. You should be able to run a binary file through sed or awk and have byte sequences change to different byte sequences.

But you can't. Maybe that's why you're here. Transformenator can help.

It turns out that this is really, really useful when faced with files from ancient word processors, for example. They used all kinds of crazy binary annotations within a file (this is before the days of text markup, remember). With Transformenator, it's easy to swap out those binary annotations for HTML or RTF tags that all of a sudden make those ancient files readable again, maybe even with their original formatting and highlighting intact. Many samples come built into the Transformenator package that can make such file conversions easy.

The list of these samples is available in CVS at this link.

Useage

Invoke the transformenator Java jar file from any command line, ant script, or what have you. A set of rules (comprising a transform) are applied to the input file and written to the output file. You can use one of the included sample transforms, and you can write your own as simply as creating a text file.

For example:

java -jar transformenator.jar transform infile outfile

Where:
transform is the name of a file containing the set of transformations you want to make
infile is the original file
outfile is the resulting file after making all transformations

Example

Say you have the file below named infile, represented in a typical hex editor:

(Offset) (Hex data) (ASCII representation) -------- --------------------------------------- ---------------- 0000000: 6162 6364 0000 6566 6768 696a 1a67 6574 abcd..efghij.get 0000010: 2072 6964 206f 6620 6d65 210a rid of me!.

Say we want to change the hex zeroes in the middle to spaces, and eliminate anything after the EOF (0x1a). Create a transform file named transform to do that:

; Make nulls into spaces 00 = 20 ; ASCII EOF character really means EOF 1a = "{@@<FiLe_EoF>@@}"

So, we run infile through the transform file and send the output to a file named outfile:

%java -jar transformenator.jar transform infile outfile

The resulting outfile looks like this in a hex editor:

(Offset) (Hex data) (ASCII representation) -------- --------------------------------------- ---------------- 0000000: 6162 6364 2020 6566 6768 696a abcd efghij

Observe that nulls became spaces, and other trailing stuff after the EOF character has been removed.

Transform Specification

The instructions for making the various transformations you want are simply listed in a plain text file. A transform file might look like this, for example:

; Change 0x00 to 0x20 00 = 20

The left side of the equals sign, hash mark, or percent sign is the byte pattern to search for. To specify it, the following conventions are used:

A byte pattern, given in even numbers of hex digits, with no other decoration
A byte pattern combined with "don't care" values consisting of even numbers of non-hex digits (periods/full stops are preferred but not enforced)
A byte pattern combined with "must not be zero" values consisting of even numbers of exclamation points

The right side can be blank, another byte pattern given in undecorated hex, or ASCII text if surrounded in double quotes. Examples: ; Hard newline 8d = 0d0a ; 0x86 appeared at the beginning of files sometimes, after the preamble. Remove it. 86 = ; Start of heading 83 = "\par\par\pard\s2\b\fs36 " ; Italics on b12d31 = "\i " ; Italics off b12d30 = "\i0 "

There is a special case for specifying returns and newlines. If you need to translate a byte pattern into something that includes them, you need to escape the backslash. For example:

; Justify on 7a0a1531 = "\\r\\n</pre>"

The table below lists the special processing capabilities of Transformenator.

Construct Explanation Example

Comment, will be ignored

Replace the left side of the equals with what's on the right side and move past the replacement looking for the next match

Replace the left side of the hash with what's on the right side and re-consider the just-replaced data for more matches

Replace the left side of the percent with what's on the right side, alternating between two values separated by commas

Shift a byte range byte-for-byte to a different range starting at the value after the equals sign, or remove them altogether

Used together when a file contains a binary length of file vector. is the least significant byte, is the next most signficant byte, and is the most significant byte of the vector. These values together comprise the offset into the file. is a static offset to tune the exact end point in case the vector is offset by some amount.

Specifies a prefix to add to the beginning of the resultant file

Specifies a suffix to add to the end of the resultant file

Specifies a number of bytes to trim off the beginning of the input file (in hex bytes) before any binary transforms are applied

Specifies a number of bytes to trim off the end of the input file (in hex bytes) before any binary transforms are applied. Results are undefined if used in conjunciton with eof_* vectors above

Specifies a regular expression to run on the output after any and all binary transforms are applied; multiple regex expressions are allowed, one per line; the first character in the string serves as the delimiter

Special command value signifying the start of file; when the value specified on left side of the equals is first found, everything up to that point will be discarded

Special command value signifying the start of file; when the value specified on left side of the equals is found for the final time in a file, everything up to that point will be discarded

Special command value signifying the end of file; everything found after that point will be discarded

The Transformenator project comes with several example transforms used to convert various early binary word processor formats to RTF, HTML, or plain ASCII text. You can see the list of available "internal" transforms by invoking the transformenator jar with no parameters, like so:

%java -jar transformenator.jar The source to these internal transforms is available in CVS at this link.

Utility Functions

The transformenator.jar includes several supporting utilities. There is a batch file/shell script, transformutil.bat/.sh, that simplifies the invocation of all of these utilities. They are:

Utility: org.transformenator.util.(...) Parameters Summary

TransformFile transform infile outfile Apply transform to a single file - the default operation of transformenator.jar

TransformDirectory transform in_directory out_directory Apply transform to all files in a filesystem tree recursively

TransformDirectory fix_filenames in_directory Convert dodgy chracters in filenames (unicode, etc.) to DOS-legal characters in a filesystem tree recursively

CreateLwpMacro in_directory out_directory Creates a Lotus Word Pro macro (lotus2word.lss) to feed to that word processor to transform all files in a filesystem to Word .doc format

DOSImage display infile
(or)
update infile outfile [force160|180|320|360|360a|1200] View or add the BIOS Parameter Block (BPB) to older DOS disk images to make them mountable - fixes those pesky "no mountable filesystems" errors, also removes the Stoned virus

ExtractAdministrativeSystemFile infile outfile Extract file from a IBM 5520 Administrative System disk image

ExtractCPTFiles infile [out_directory] Extract files from CPT word processor disk image

ExtractCSV csv-transform infile outfile.csv Extract fixed-length records from a file as comma separated values (see csv-transform specification here)

ExtractCTOSArchive infile [out_directory] Extract files from the tar-like archive in CTOS (incomplete)

ExtractDisplaywriterFiles infile [out_directory] Extract files from Displaywriter word processor disk image

ExtractEasyWriterFiles infile [out_directory] Extract EasyWriter word processing files from a 13-sector Apple II disk image

ExtractHardFiles infile [out_directory] Extract files from an otherwise unknown word processor disk image from FC5025 hard-sector disk capture

ExtractHPFiles infile [out_directory] Extract files from HP instrument (not LIF) disk image

ExtractIBM8Files infile [out_directory] Extract files from IBM-formatted 8" disk image

ExtractMagiFiles infile [out_directory] Extract files from Magi Major Leaguer 8" disk image

ExtractMemoryWriterFiles infile [out_directory] Extract files from a disk image from an unknown word processor, possibly Xerox MemoryWriter

ExtractOfficeSystem6Files infile [out_directory] Extract files from IBM Office System 6 disk images

ExtractPanasonicFiles infile [out_directory] Extract files from Panasonic KX-* word processor disk images

ExtractSeawellFiles infile [out_directory] Extract Seawell DOS files from 8" disk images

ExtractSmithCoronaFiles infile [out_directory] Extract files from Smith-Corona typewriter disk images

ExtractWangFiles infile [out_directory] Extract Wang OIS word processing files from a Wang disk image

ExtractXerox860Files infile [out_directory] Extract files from Xerox 860 word processor disk image

RevealValdocsEntries infile outfile "Un-hide" the CP/M directory entries of files that have a leading byte of 0x60 in a disk image - typical of Valdocs files on Epson QX-10 TPM-II disk images

Utility: `org.transformenator.util.(...)`	Parameters	Summary
`TransformFile`	`transform infile outfile`	Apply transform to a single file - the default operation of transformenator.jar
`TransformDirectory`	`transform in_directory out_directory`	Apply transform to all files in a filesystem tree recursively
`TransformDirectory`	`fix_filenames in_directory`	Convert dodgy chracters in filenames (unicode, etc.) to DOS-legal characters in a filesystem tree recursively
`CreateLwpMacro`	`in_directory out_directory`	Creates a Lotus Word Pro macro (lotus2word.lss) to feed to that word processor to transform all files in a filesystem to Word .doc format
`DOSImage`	`display infile` (or) `update infile outfile [force160\|180\|320\|360\|360a\|1200]`	View or add the BIOS Parameter Block (BPB) to older DOS disk images to make them mountable - fixes those pesky "no mountable filesystems" errors, also removes the Stoned virus
`ExtractAdministrativeSystemFile`	`infile outfile`	Extract file from a IBM 5520 Administrative System disk image
`ExtractCPTFiles`	`infile [out_directory]`	Extract files from CPT word processor disk image
`ExtractCSV`	`csv-transform infile outfile.csv`	Extract fixed-length records from a file as comma separated values (see csv-transform specification here)
`ExtractCTOSArchive`	`infile [out_directory]`	Extract files from the tar-like archive in CTOS (incomplete)
`ExtractDisplaywriterFiles`	`infile [out_directory]`	Extract files from Displaywriter word processor disk image
`ExtractEasyWriterFiles`	`infile [out_directory]`	Extract EasyWriter word processing files from a 13-sector Apple II disk image
`ExtractHardFiles`	`infile [out_directory]`	Extract files from an otherwise unknown word processor disk image from FC5025 hard-sector disk capture
`ExtractHPFiles`	`infile [out_directory]`	Extract files from HP instrument (not LIF) disk image
`ExtractIBM8Files`	`infile [out_directory]`	Extract files from IBM-formatted 8" disk image
`ExtractMagiFiles`	`infile [out_directory]`	Extract files from Magi Major Leaguer 8" disk image
`ExtractMemoryWriterFiles`	`infile [out_directory]`	Extract files from a disk image from an unknown word processor, possibly Xerox MemoryWriter
`ExtractOfficeSystem6Files`	`infile [out_directory]`	Extract files from IBM Office System 6 disk images
`ExtractPanasonicFiles`	`infile [out_directory]`	Extract files from Panasonic KX-* word processor disk images
`ExtractSeawellFiles`	`infile [out_directory]`	Extract Seawell DOS files from 8" disk images
`ExtractSmithCoronaFiles`	`infile [out_directory]`	Extract files from Smith-Corona typewriter disk images
`ExtractWangFiles`	`infile [out_directory]`	Extract Wang OIS word processing files from a Wang disk image
`ExtractXerox860Files`	`infile [out_directory]`	Extract files from Xerox 860 word processor disk image
`RevealValdocsEntries`	`infile outfile`	"Un-hide" the CP/M directory entries of files that have a leading byte of 0x60 in a disk image - typical of Valdocs files on Epson QX-10 TPM-II disk images

Each is invokable with a Java command like this:java -cp transformenator.jar org.transformenator.util.TransformDirectory transform in_directory out_directory

Related Projects

Binary Block Editor (bbe) is a sed-like editor for binary files.