Transformenator CSV Transform

The ExtractCSV utility function takes a file and divides it up into fixed-length records with fixed-length fields that you specify and creates a comma-separated values (CSV) file of them. Data from the original file can be interpreted as ASCII text, as EBCDIC text, or as hexadecimal values.

The table below defines the keywords and values used to define an interpretation of a file as records. Note that NEXTREC is required. The case of the keywords is not significant. All numerical values are given in undecorated hexadecimal.

Keyword Explanation Example

Comment, will be ignored

(Optional) Offset (in hex bytes) to the first record to be considered

(Required) Offset (in hex bytes) to the next record

(Optional) criteria that must be met to consider a record for output; specification first lists the offset within the record, followed by a comma, followed by one or more hex bytes that must match. Multiple SelectIf clauses may be used to produce logical OR conditions.

(Optional) Textual definition of fields based on positions of repeating characters.

The table below defines export instructions for a single field when there is no LAYOUT definition. Each field definition must contain one each of the following four keywords. Multiple sets of these specifications typically make up a full record to export. The case of the keywords is not significant. Fields can overlap and skip positions as desired. All numerical values are given in undecorated hexadecimal.

Keyword Explanation Example

Field name to be exported on header row of output CSV file

Offset within the record to this field's data

Length of this field's data

or Interpretation of field data to export - one of:
ASCII - ASCII text
EBCDIC - EBCDIC text
HEX - Hexadecimal notation
Text will be encoded in the output CSV file as specified with surrounding quotes. With InterpLiteral, additional syntax and quoting will be added to the output CSV file that convinces spreadsheets not to re-interpret numerical values.

Automatic Layout Processing

Alternately, you can let the computer do the computing by specifying textually where the fields are and what to call the fields and how to interpret them. So, given a representation like this:


LAYOUT="aaaaa2222rrrffftttt"

This specifies that there are going to be five fields (each character transition triggers a new field definition) and they have the positional and length specifications as demonstrated directly by the existence of repeating characters. It is not significant what the characters actually are. It is not possible to skip over bytes, and it is not possible to overlap fields when using LAYOUT. When a CSV extraction is performed with a transform containing a LAYOUT specification, the computed field values will be printed to the console so they can be used to refine the specification if things like overlapping fields are required. With a LAYOUT specification, the definition of each field is minimized to just the name and interpretation:

Keyword	Explanation	Example
	Field name to be exported on header row of output CSV file
or	Interpretation of field data to export - one of: `ASCII` - ASCII text `EBCDIC` - EBCDIC text `HEX` - Hexadecimal notation Text will be encoded in the output CSV file as specified with surrounding quotes. With `InterpLiteral`, additional syntax and quoting will be added to the output CSV file that convinces spreadsheets not to re-interpret numerical values.

When there are both LAYOUT and full field (ORIGIN and LENGTH) specifications present, they will all be merged. That is, the "manual" field specifications will be respected for their field names and interpretation types, but the position attributes will be overwritten by the LAYOUT specification. Additional fields specified by LAYOUT but missing accompanying manual definitions will be given default names and interpretation types, and will be given position specifications from the LAYOUT pattern.

When exploring an otherwise unknown disk's files, it is often useful to first dump the entire set of records with a textual representation of the entire thing on one line, then manually inspect those results to ascertain where the field boundaries are based on common patterns in the data. Using the LAYOUT keyword here is an easy way to line up the data with positionally-dependent fields visually.

Examples

Below is an example csv-transform file that interprets a disk image taken from an IBM System/36 floppy disk that was used for record storage: ; ; Custom IBM System/36 to CSV transform ; FIRSTREC=0E00 ; First record starts at 0xE00 NEXTREC=100 ; Records continue every 0x100 bytes SELECTIF=00,40 ; The zeroeth byte needs to be an EBCDIC space ; NAME="Field 1" ; First field definition ORIGIN=01 ; Starts at byte 0x01 (zero-indexed) LENGTH=09 ; Runs for a length of 0x09 bytes INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 2" ; Second field definition ORIGIN=0A ; Starts at byte 0x0a (zero-indexed) LENGTH=1E ; Runs for a length of 0x1e bytes INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 3" ; Third field definition ORIGIN=28 ; Starts at byte 0x28 (zero-indexed) LENGTH=05 ; Runs for a length of 0x05 bytes INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 4" ; Fourth field definition ORIGIN=2d ; Starts at byte 0x2d (zero-indexed) LENGTH=02 ; Runs for a length of 0x02 bytes INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 5" ; Fifth field definition ORIGIN=2f ; Starts at byte 0x2f (zero-indexed) LENGTH=3d ; Runs for a length of 0x3d bytes INTERP=HEX ; Interpret data as hexadecimal bytes

Here is that same example, using the LAYOUT specification:

; ; Custom IBM System/36 to CSV transform ; FIRSTREC=0E00 ; First record starts at 0xE00 NEXTREC=100 ; Records continue every 0x100 bytes SELECTIF=00,40 ; The zeroeth byte needs to be an EBCDIC space LAYOUT="011111111122222222222222222222222222222233333445555555555555555555555555555555555555555555555555555555555555" ; NAME="Ignore" INTERP=EBCDIC ; NAME="Field 1" ; First field definition INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 2" ; Second field definition INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 3" ; Third field definition INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 4" ; Fourth field definition INTERP=EBCDIC ; Interpret data as EBCDIC ; NAME="Field 5" ; Fifth field definition INTERP=HEX ; Interpret data as hexadecimal bytes

Keyword	Explanation	Example
	Comment, will be ignored
	(Optional) Offset (in hex bytes) to the first record to be considered
	(Required) Offset (in hex bytes) to the next record
	(Optional) criteria that must be met to consider a record for output; specification first lists the offset within the record, followed by a comma, followed by one or more hex bytes that must match. Multiple `SelectIf` clauses may be used to produce logical OR conditions.
	(Optional) Textual definition of fields based on positions of repeating characters.

Keyword	Explanation	Example
	Field name to be exported on header row of output CSV file
	Offset within the record to this field's data
	Length of this field's data
or	Interpretation of field data to export - one of: `ASCII` - ASCII text `EBCDIC` - EBCDIC text `HEX` - Hexadecimal notation Text will be encoded in the output CSV file as specified with surrounding quotes. With `InterpLiteral`, additional syntax and quoting will be added to the output CSV file that convinces spreadsheets not to re-interpret numerical values.