Split to Rows

The SPLIT-TO-ROWS directive splits a column based on a separator into multiple records, making a copy of the record for each column value generated by splitting.

Syntax

split-to-rows <column> <separator>

The <column> is split based on the <separator>, which can be defined as a regular expression (regex).

Usage Notes

The SPLIT-TO-ROW directive takes a column, applies the regular expression separator, and then creates multiple rows from the split.

Regular expressions allows the use of complex search patterns when splitting the data in the column. It supports standard Java regular expression constructs.

If the <separator> regular expression pattern does not match any part of the input string, then no additional rows are generated.

Where there is a match, and the match splits the input string into n strings, then n rows created by copying other columns will be generated.

Note: This directive can only operate on columns of type string.

Examples

If we have a <separator> pattern of \n (a linefeed) over the string:

This will be split 1\nThis will be split 2

This will generate two output records with the column having these values:

[
  {  "column": "This will be split 1" },
  {  "column": "This will be split 2" }
]

Using this record as an example:

{
  "id": 1,
  "codes": "USD|AUD|AMD|XCD",
}

Applying this directive:

split-to-rows codes \|

Note: A backslash is required to escape the pipe character (|) as it is an optional separator in a regex pattern.

This would result in four records being generated, with each split value being assigned to the column codes:

[
  { "id": 1, "codes": "USD" },
  { "id": 1, "codes": "AUD" },
  { "id": 1, "codes": "AMD" },
  { "id": 1, "codes": "XCD" }
]