DataPrep Concepts

This implementation of Data Prep uses the concepts of Record, Column, Directive, Step, and Pipeline.


A Record is a collection of field names and field values.


A Column is a data value of any of the supported Java types, one for each record.


A Directive is a single data manipulation instruction, specified to either transform, filter, or pivot a single record into zero or more records. A directive can generate one or more steps to be executed by a pipeline.


A Step is an implementation of a data transformation function, operating on a single record or set of records. A step can generate zero or more records from the application of a function.


A Pipeline is a collection of steps to be applied on a record. The record(s) outputed from a step are passed to the next step in the pipeline.



A directive can be represented in text in this format:

<command> <argument-1> <argument-2> ... <argument-n>


A record in this documentation will be shown as a JSON object with an object key representing the column names and a value shown by the plain representation of the the data, without any mention of types.

For example:

  "id": 1,
  "fname": "root",
  "lname": "joltie",
  "address": {
    "housenumber": "678",
    "street": "Mars Street",
    "city": "Marcity",
    "state": "Maregon",
    "country": "Mari"
  "gender": "M"