Transformenator CSV Transform

The ExtractCSV utility function takes a file and divides it up into fixed-length records with fixed-length fields that you specify and creates a comma-separated values (CSV) file of them. Data from the original file can be interpreted as ASCII text, as EBCDIC text, or as hexadecimal values.

The table below defines the keywords and values used to define an interpretation of a file as records. Note that NEXTREC is required. The case of the keywords is not significant. All numerical values are given in undecorated hexadecimal.
Keyword Explanation Example
; Comment, will be ignored ; This text will not be considered
FirstRec (Optional) Offset (in hex bytes) to the first record to be considered FIRSTREC=0E00
NextRec (Required) Offset (in hex bytes) to the next record NEXTREC=100 ; One record every 0x100 bytes
SelectIf (Optional) criteria that must be met to consider a record for output; specification first lists the offset within the record, followed by a comma, followed by one or more hex bytes that must match. Multiple SelectIf clauses may be used to produce logical OR conditions. SELECTIF=00,40 ; First byte must be 0x40 to be valid SELECTIF=01,20 ; Or, second byte can be 0x20
Layout (Optional) Textual definition of fields based on positions of repeating characters. ; Define field layout by example LAYOUT="mmddyy..nnnnnnxxxxxxxpppppp"

The table below defines export instructions for a single field when there is no LAYOUT definition. Each field definition must contain one each of the following four keywords. Multiple sets of these specifications typically make up a full record to export. The case of the keywords is not significant. Fields can overlap and skip positions as desired. All numerical values are given in undecorated hexadecimal.
Keyword Explanation Example
Name Field name to be exported on header row of output CSV file NAME="Field 1"
Origin Offset within the record to this field's data ORIGIN=01
Length Length of this field's data LENGTH=0a
Interp or InterpLiteral Interpretation of field data to export - one of:
ASCII - ASCII text
EBCDIC - EBCDIC text
HEX - Hexadecimal notation
Text will be encoded in the output CSV file as specified with surrounding quotes. With InterpLiteral, additional syntax and quoting will be added to the output CSV file that convinces spreadsheets not to re-interpret numerical values.
INTERPLITERAL=EBCDIC

Automatic Layout Processing

Alternately, you can let the computer do the computing by specifying textually where the fields are and what to call the fields and how to interpret them. So, given a representation like this:
LAYOUT="aaaaa2222rrrffftttt"
This specifies that there are going to be five fields (each character transition triggers a new field definition) and they have the positional and length specifications as demonstrated directly by the existence of repeating characters. It is not significant what the characters actually are. It is not possible to skip over bytes, and it is not possible to overlap fields when using LAYOUT. When a CSV extraction is performed with a transform containing a LAYOUT specification, the computed field values will be printed to the console so they can be used to refine the specification if things like overlapping fields are required. With a LAYOUT specification, the definition of each field is minimized to just the name and interpretation:
Keyword Explanation Example
Name Field name to be exported on header row of output CSV file NAME="Field 1"
Interp or InterpLiteral Interpretation of field data to export - one of:
ASCII - ASCII text
EBCDIC - EBCDIC text
HEX - Hexadecimal notation
Text will be encoded in the output CSV file as specified with surrounding quotes. With InterpLiteral, additional syntax and quoting will be added to the output CSV file that convinces spreadsheets not to re-interpret numerical values.
INTERPLITERAL=EBCDIC

When there are both LAYOUT and full field (ORIGIN and LENGTH) specifications present, they will all be merged. That is, the "manual" field specifications will be respected for their field names and interpretation types, but the position attributes will be overwritten by the LAYOUT specification. Additional fields specified by LAYOUT but missing accompanying manual definitions will be given default names and interpretation types, and will be given position specifications from the LAYOUT pattern.

When exploring an otherwise unknown disk's files, it is often useful to first dump the entire set of records with a textual representation of the entire thing on one line, then manually inspect those results to ascertain where the field boundaries are based on common patterns in the data. Using the LAYOUT keyword here is an easy way to line up the data with positionally-dependent fields visually.

Examples

Below is an example csv-transform file that interprets a disk image taken from an IBM System/36 floppy disk that was used for record storage: ; ; Custom IBM System/36 to CSV transform ; FIRSTREC=0E00 ; First record starts at 0xE00 NEXTREC=100 ; Records continue every 0x100 bytes SELECTIF=00,40 ; The zeroeth byte needs to be an EBCDIC space ; NAME="Field 1" ; First field definition ORIGIN=01 ; Starts at byte 0x01 (zero-indexed) LENGTH=09 ; Runs for a length of 0x09 bytes INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 2" ; Second field definition ORIGIN=0A ; Starts at byte 0x0a (zero-indexed) LENGTH=1E ; Runs for a length of 0x1e bytes INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 3" ; Third field definition ORIGIN=28 ; Starts at byte 0x28 (zero-indexed) LENGTH=05 ; Runs for a length of 0x05 bytes INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 4" ; Fourth field definition ORIGIN=2d ; Starts at byte 0x2d (zero-indexed) LENGTH=02 ; Runs for a length of 0x02 bytes INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 5" ; Fifth field definition ORIGIN=2f ; Starts at byte 0x2f (zero-indexed) LENGTH=3d ; Runs for a length of 0x3d bytes INTERP=HEX ; Interpret data as hexadecimal bytes

Here is that same example, using the LAYOUT specification:

; ; Custom IBM System/36 to CSV transform ; FIRSTREC=0E00 ; First record starts at 0xE00 NEXTREC=100 ; Records continue every 0x100 bytes SELECTIF=00,40 ; The zeroeth byte needs to be an EBCDIC space LAYOUT="011111111122222222222222222222222222222233333445555555555555555555555555555555555555555555555555555555555555" ; NAME="Ignore" INTERP=EBCDIC ; NAME="Field 1" ; First field definition INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 2" ; Second field definition INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 3" ; Third field definition INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 4" ; Fourth field definition INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 5" ; Fifth field definition INTERP=HEX ; Interpret data as hexadecimal bytes