Using the Apache Hadoop driver

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

You can access and use the installed Apache Hadoop driver for HDFS copy file operations as well as for executing input and output transformations and jobs. The driver works with both secure and unsecured clusters. Because the driver is shipped as installed, you do not have to install a KAR file.

The supported big data steps in Pentaho include:

Both operating system file browsers and the Pentaho virtual file system browsers are supported, as well as basic HDFS and VFS operations. For more information, see Connecting to Virtual File Systems.

Note: Only Hadoop clusters that conform with standard Hadoop connection rules work with the Apache Hadoop Driver. While Hortonworks, Cloudera and EMR clusters may work, MapR does not work with this driver because the connection rules for MapR are not standard. The Apache Hadoop Driver is not intended to support higher level Hadoop operations such as Hive, HBase, Sqoop, and Oozie. If you require these operations, install the KAR file for the applicable vendor.