Job Setup tab

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15


Job Setup tab, Pentaho MapReduce

The following table describes the options for setting up the inputs and outputs of the job:

Option Definition
Input path Enter the path of the input directory, such as /wordcount/input, from your Hadoop cluster where the source data for the MapReduce job is stored. A comma-separated list can be used for multiple input directories.
Output path

Enter the path of the directory, such as /wordcount/output, on your Hadoop cluster where you want the output from the MapReduce job to be stored.

Note: The output directory cannot exist prior to running the MapReduce job.
Remove output path before job Select to remove the specified output path before the MapReduce job is scheduled.
Input format Enter the Apache Hadoop class name that describes the input specification for the MapReduce job. See InputFormat for more information.
Output format Enter the Apache Hadoop class name that describes the output specification for the MapReduce job. See OutputFormatfor more information.

Ignore output of map key

Select to ignore the key output from the mapper transformation and replace it with NullWritable.
Ignore output of map value Select to ignore the value output from the mapper transformation and replace it with NullWritable.
Ignore output of reduce key Select to ignore the key output from the combiner and/or reducer transformations and replace them with NullWritable. This requires a reducer transformation to be used, not the Identity Reducer.
Ignore output of reduce value Select to ignore the key output from the combiner and/or reducer transformations and replace them with NullWritable. This requires a reducer transformation to be used, not the Identity Reducer.