Examples

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15
Suppose your input field contains a text value like "Author, Ann" - 53 posts. The following regular expression creates four capturing groups and can be used to parse out the different parts:
^"((["]), (["]))" - (\d+) posts\.$

This expression creates the following four capturing groups, which become output fields:

  • Fullname: ((["]), (["]))
  • Lastname: ([^"]+)
  • Firstname: ([^"]+)
  • Number of posts: (\d+)

In this example, a field definition must be present for each of these capturing groups.

If the number of capture groups in the regular expression does not match the number of fields specified, the step will fail and an error is written to the log. Capturing groups can be nested. In the example above the fields Lastname and Firstname correspond to the capturing groups that are themselves contained inside the Fullname capturing group.

The design-tools/data-integration/samples/transformations directory contains the samples/transformations/Regex Eval - parse NCSA access log records.ktr as another example on how to use this step.