Job settings tab

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15


Job settings tab, Amazon EMR Job Executor

This tab includes the following options:

Option Description
EMR job flow name Specify the name of the Amazon EMR job flow to execute.
S3 staging directory Specify the Amazon Simple Storage Service (S3) address of the working directory for this Hadoop job. This directory will contain the MapReduce JAR and log files.
MapReduce Jar Specify the address of the Java JAR that contains your Hadoop mapper and reducer classes. The job must be configured and submitted using a static main method in any class of the JAR.
Command line arguments Enter in any command line arguments you want to pass into the static main method of the specified MapReduce Jar. Use spaces to separate multiple arguments.
Keep job flow alive Select if you want to keep your job flow active after the PDI entry finishes. If this option is not selected, the job flow will terminate when the PDI entry finishes.
Enable blocking

Select if you want to force the job to wait until each PDI entry completes before continuing to the next entry. Blocking is the only way for PDI to be aware of the status of a Hadoop job. Additionally, selecting this option enables proper error handling and routing.

When you clear this option, the Hadoop job is blindly executed and PDI moves on to the next entry.

Logging interval If you Enable blocking, specify the number of seconds between status log messages.