The Hadoop Job Executor and Pentaho MapReduce steps have an advanced configuration mode that enables you to specify data types for the job's input and output. PDI is unable to detect foreign data types on its own; therefore you must specify the input and output data types in the Job Setup tab.
This table explains the relationship between Hadoop data types and their PDI equivalents.
PDI (Kettle) Data Type | Apache Hadoop Data Type |
---|---|
java.lang.Integer | org.apache.hadoop.io.IntWritable |
java.lang.Long | org.apache.hadoop.io.IntWritable |
java.lang.Long | org.apache.hadoop.io.LongWritable |
org.apache.hadoop.io.IntWritable | java.lang.Long |
java.lang.String | org.apache.hadoop.io.Text |
java.lang.String | org.apache.hadoop.io.IntWritable |
org.apache.hadoop.io.LongWritable | org.apache.hadoop.io.Text |
org.apache.hadoop.io.LongWritable | java.lang.Long |
For more information on configuring Pentaho MapReduce to convert to additional data types, see Pentaho MapReduce.