The following table describes the options for setting up the inputs and outputs of the job:
Option | Definition |
---|---|
Input path | Enter the path of the input directory, such as /wordcount/input, from your Hadoop cluster where the source data for the MapReduce job is stored. A comma-separated list can be used for multiple input directories. |
Output path |
Enter the path of the directory, such as /wordcount/output, on your Hadoop cluster where you want the output from the MapReduce job to be stored. Note: The output directory cannot exist prior to running the MapReduce job.
|
Remove output path before job | Select to remove the specified output path before the MapReduce job is scheduled. |
Input format | Enter the Apache Hadoop class name that describes the input specification for the MapReduce job. See InputFormat for more information. |
Output format | Enter the Apache Hadoop class name that describes the output specification for the MapReduce job. See OutputFormatfor more information. |
Ignore output of map key |
Select to ignore the key output from the mapper transformation and replace it with NullWritable. |
Ignore output of map value | Select to ignore the value output from the mapper transformation and replace it with NullWritable. |
Ignore output of reduce key | Select to ignore the key output from the combiner and/or reducer transformations and replace them with NullWritable. This requires a reducer transformation to be used, not the Identity Reducer. |
Ignore output of reduce value | Select to ignore the key output from the combiner and/or reducer transformations and replace them with NullWritable. This requires a reducer transformation to be used, not the Identity Reducer. |