Some ETL activities are lightweight, such as loading in a small text file to write out to a database or filtering a few rows to trim down your results. For these activities, you can run your transformation locally using the default Pentaho engine. Some ETL activities are more demanding, containing many steps calling other steps or a network of transformation modules. For these activities, you can set up a separate Pentaho Server dedicated for running transformations using the Pentaho engine. Other ETL activities involve large amounts of data on network clusters requiring greater scalability and reduced execution times. For these activities, you can run your transformation using the Spark engine in a Hadoop cluster.
Run configurations allow you to select when to use either the Pentaho (Kettle) or Spark engine. You can create or edit these configurations through the Run configurations folder in the View tab as shown below:
To create a new run configuration, right-click on the Run configurations folder and select New. To edit or delete a run configuration, right-click on an existing configuration.
Selecting New or Edit opens the Run configuration dialog box that contains the following fields:
Field | Description |
---|---|
Name | Specify the name of the run configuration. |
Description | Optionally, specify details of your configuration. |
Engine | Select the type of engine for running a transformation. You can run a transformation with either a Pentaho or a Spark engine. The fields displayed in the Settings section of the dialog box depend on which engine you select. |