DataPrep Concepts

This implementation of Data Prep uses the concepts of Record, Column, Directive, Step, and Pipeline.

Record

A Record is a collection of field names and field values.

Column

A Column is a data value of any of the supported Java types, one for each record.

Directive

A Directive is a single data manipulation instruction, specified to either transform, filter, or pivot a single record into zero or more records. A directive can generate one or more steps to be executed by a pipeline.

Step

A Step is an implementation of a data transformation function, operating on a single record or set of records. A step can generate zero or more records from the application of a function.

Pipeline

A Pipeline is a collection of steps to be applied on a record. The record(s) outputed from a step are passed to the next step in the pipeline.

Notations

Directives

A directive can be represented in text in this format:

<command> <argument-1> <argument-2> ... <argument-n>

Record

A record in this documentation will be shown as a JSON object with an object key representing the column names and a value shown by the plain representation of the the data, without any mention of types.

For example:

{
  "id": 1,
  "fname": "root",
  "lname": "joltie",
  "address": {
    "housenumber": "678",
    "street": "Mars Street",
    "city": "Marcity",
    "state": "Maregon",
    "country": "Mari"
  },
  "gender": "M"
}