Hadoop to PDI data type conversion

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

The Hadoop Job Executor and Pentaho MapReduce steps have an advanced configuration mode that enables you to specify data types for the job's input and output. PDI is unable to detect foreign data types on its own; therefore you must specify the input and output data types in the Job Setup tab.

This table explains the relationship between Hadoop data types and their PDI equivalents.

PDI (Kettle) Data Type Apache Hadoop Data Type
java.lang.Integer org.apache.hadoop.io.IntWritable
java.lang.Long org.apache.hadoop.io.IntWritable
java.lang.Long org.apache.hadoop.io.LongWritable
org.apache.hadoop​.io.IntWritable java.lang.Long
java.lang.String org.apache.hadoop.io.Text
java.lang.String org.apache.hadoop​.io.IntWritable
org.apache.hadoop.io​.LongWritable org.apache.hadoop.io​.Text
org.apache.hadoop.io​.LongWritable java.lang.Long

For more information on configuring Pentaho MapReduce to convert to additional data types, see Pentaho MapReduce.