Transformenator (Download)

Transformenation is something that should be possible to do with some rudimentary shell scripting. You should be able to run a binary file through sed or awk and have byte sequences change to different byte sequences.

But you can't. Maybe that's why you're here. Transformenator can help.

It turns out that this is really, really useful when faced with files from ancient word processors, for example. They used all kinds of crazy binary annotations within a file (this is before the days of text markup, remember). With Transformenator, it's easy to swap out those binary annotations for HTML or RTF tags that all of a sudden make those ancient files readable again, maybe even with their original formatting and highlighting intact. Many samples come built into the Transformenator package that can make such file conversions easy.

The list of these samples is available in CVS at this link.

Useage

Invoke the transformenator Java jar file from any command line, ant script, or what have you. A set of rules (comprising a transform) are applied to the input file and written to the output file. You can use one of the included sample transforms, and you can write your own as simply as creating a text file.

For example:

java -jar transformenator.jar transform infile outfile

Where:
transform is the name of a file containing the set of transformations you want to make
infile is the original file
outfile is the resulting file after making all transformations

Example

Say you have the file below named infile, represented in a typical hex editor:

(Offset) (Hex data) (ASCII representation) -------- --------------------------------------- ---------------- 0000000: 6162 6364 0000 6566 6768 696a 1a67 6574 abcd..efghij.get 0000010: 2072 6964 206f 6620 6d65 210a rid of me!.

Say we want to change the hex zeroes in the middle to spaces, and eliminate anything after the EOF (0x1a). Create a transform file named transform to do that:

; Make nulls into spaces 00 = 20 ; ASCII EOF character really means EOF 1a = "{@@<FiLe_EoF>@@}"

So, we run infile through the transform file and send the output to a file named outfile:

%java -jar transformenator.jar transform infile outfile

The resulting outfile looks like this in a hex editor:

(Offset) (Hex data) (ASCII representation) -------- --------------------------------------- ---------------- 0000000: 6162 6364 2020 6566 6768 696a abcd efghij

Observe that nulls became spaces, and other trailing stuff after the EOF character has been removed.

Transform Specification

The instructions for making the various transformations you want are simply listed in a plain text file. A transform file might look like this, for example:

; Change 0x00 to 0x20 00 = 20

The left side of the equals sign, hash mark, or percent sign is the byte pattern to search for. To specify it, the following conventions are used:

The right side can be blank, another byte pattern given in undecorated hex, or ASCII text if surrounded in double quotes. Examples: ; Hard newline 8d = 0d0a ; 0x86 appeared at the beginning of files sometimes, after the preamble. Remove it. 86 = ; Start of heading 83 = "\par\par\pard\s2\b\fs36 " ; Italics on b12d31 = "\i " ; Italics off b12d30 = "\i0 "

There is a special case for specifying returns and newlines. If you need to translate a byte pattern into something that includes them, you need to escape the backslash. For example:

; Justify on 7a0a1531 = "\\r\\n</pre>"

The table below lists the special processing capabilities of Transformenator.
Construct Explanation Example
; Comment, will be ignored ; This text will not be considered
= Replace the left side of the equals with what's on the right side and move past the replacement looking for the next match ; Replace nulls with spaces 00 = 20
# Replace the left side of the hash with what's on the right side and re-consider the just-replaced data for more matches ; Add a newline - may need to add two 8d # 0d0a 0d0a9d = 0d0a0d0a
% Replace the left side of the percent with what's on the right side, alternating between two values separated by commas ; Toggle italics on and off 7f047f % "\i1 ","\i0 "
[xx..yy] = zz Shift a byte range byte-for-byte to a different range starting at the value after the equals sign, or remove them altogether ; Shift 0x41 through 0x5a up to 0x61 [41..5a] = 61 ; Remove 0xf0 through 0xff [f0..ff] =
eof_lo
eof_mid
eof_hi
eof_offset
Used together when a file contains a binary length of file vector. eof_lo is the least significant byte, eof_mid is the next most signficant byte, and eof_high is the most significant byte of the vector. These values together comprise the offset into the file. eof_offset is a static offset to tune the exact end point in case the vector is offset by some amount. ; Calculate the EOF based on file contents eof_hi = 00 eof_mid = 7d eof_lo = 7c eof_offset = 7f
head Specifies a prefix to add to the beginning of the resultant file ; Header for HTML format head = "<html>"
tail Specifies a suffix to add to the end of the resultant file ; Trailer for HTML format tail = "</html>"
trim_leading Specifies a number of bytes to trim off the beginning of the input file (in hex bytes) before any binary transforms are applied ; Trim the leading 0x400 bytes trim_leading = 0400
trim_trailing Specifies a number of bytes to trim off the end of the input file (in hex bytes) before any binary transforms are applied. Results are undefined if used in conjunciton with eof_* vectors above ; Trim the trailing 0xDF bytes trim_trailing = df
regex Specifies a regular expression to run on the output after any and all binary transforms are applied; multiple regex expressions are allowed, one per line; the first character in the string serves as the delimiter ; Remove margin specs: zlmxx, :rmxx regex = @[z|:][l|r]m[0-9]*@@ ; Remove next file specification regex = @znx:.*@@
{@@<FiLe_SoF>@@} Special command value signifying the start of file; when the value specified on left side of the equals is first found, everything up to that point will be discarded ; SOF is \r\nS 0d0a53 = "{@@<FiLe_SoF>@@}"
{@@<FiLe_SoF_GrEeDy>@@} Special command value signifying the start of file; when the value specified on left side of the equals is found for the final time in a file, everything up to that point will be discarded ; SOF may appear several times 1f = "{@@<FiLe_SoF_GrEeDy>@@}"
{@@<FiLe_EoF>@@} Special command value signifying the end of file; everything found after that point will be discarded ; EOF is ctrl-Z 1a = "{@@<FiLe_EoF>@@}"

The Transformenator project comes with several example transforms used to convert various early binary word processor formats to RTF, HTML, or plain ASCII text. You can see the list of available "internal" transforms by invoking the transformenator jar with no parameters, like so:

%java -jar transformenator.jar The source to these internal transforms is available in CVS at this link.

Utility Functions

The transformenator.jar includes several supporting utilities. There is a batch file/shell script, transformutil.bat/.sh, that simplifies the invocation of all of these utilities. They are:
Utility: org.transformenator.util.(...) Parameters Summary
TransformFile transform infile outfile Apply transform to a single file - the default operation of transformenator.jar
TransformDirectory transform in_directory out_directory Apply transform to all files in a filesystem tree recursively
TransformDirectory fix_filenames in_directory Convert dodgy chracters in filenames (unicode, etc.) to DOS-legal characters in a filesystem tree recursively
CreateLwpMacro in_directory out_directory Creates a Lotus Word Pro macro (lotus2word.lss) to feed to that word processor to transform all files in a filesystem to Word .doc format
DOSImage display infile
(or)
update infile outfile [force160|180|320|360|360a|1200]
View or add the BIOS Parameter Block (BPB) to older DOS disk images to make them mountable - fixes those pesky "no mountable filesystems" errors, also removes the Stoned virus
ExtractAdministrativeSystemFile infile outfile Extract file from a IBM 5520 Administrative System disk image
ExtractCPTFiles infile [out_directory] Extract files from CPT word processor disk image
ExtractCSV csv-transform infile outfile.csv Extract fixed-length records from a file as comma separated values (see csv-transform specification here)
ExtractCTOSArchive infile [out_directory] Extract files from the tar-like archive in CTOS (incomplete)
ExtractDisplaywriterFiles infile [out_directory] Extract files from Displaywriter word processor disk image
ExtractEasyWriterFiles infile [out_directory] Extract EasyWriter word processing files from a 13-sector Apple II disk image
ExtractHardFiles infile [out_directory] Extract files from an otherwise unknown word processor disk image from FC5025 hard-sector disk capture
ExtractHPFiles infile [out_directory] Extract files from HP instrument (not LIF) disk image
ExtractIBM8Files infile [out_directory] Extract files from IBM-formatted 8" disk image
ExtractMagiFiles infile [out_directory] Extract files from Magi Major Leaguer 8" disk image
ExtractMemoryWriterFiles infile [out_directory] Extract files from a disk image from an unknown word processor, possibly Xerox MemoryWriter
ExtractOfficeSystem6Files infile [out_directory] Extract files from IBM Office System 6 disk images
ExtractPanasonicFiles infile [out_directory] Extract files from Panasonic KX-* word processor disk images
ExtractSeawellFiles infile [out_directory] Extract Seawell DOS files from 8" disk images
ExtractSmithCoronaFiles infile [out_directory] Extract files from Smith-Corona typewriter disk images
ExtractWangFiles infile [out_directory] Extract Wang OIS word processing files from a Wang disk image
ExtractXerox860Files infile [out_directory] Extract files from Xerox 860 word processor disk image
RevealValdocsEntries infile outfile "Un-hide" the CP/M directory entries of files that have a leading byte of 0x60 in a disk image - typical of Valdocs files on Epson QX-10 TPM-II disk images

Each is invokable with a Java command like this:java -cp transformenator.jar org.transformenator.util.TransformDirectory transform in_directory out_directory

Related Projects

Binary Block Editor (bbe) is a sed-like editor for binary files.

Transformenator Project Page at SourceForge

Get Transformenator at SourceForge.net. Fast, secure and Free Open Source software downloads