Using the Group By step on the Spark engine

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

You can set up the Group By step to run on the Spark engine.

When Spark is processing the transformation, the rows are sorted by the group fields. The field names cannot contain spaces, dashes, or special characters. Each field name must start with a letter.

Optionally, you can use the Sort step before the Group By step. If your existing transformations contains a Sort step before the Group By step, it will run successfully.
Note: The Group By and the Memory Group By steps work the same.