You can access and use the installed Apache Hadoop driver for HDFS copy file operations as well as for executing input and output transformations and jobs. The driver works with both secure and unsecured clusters. Because the driver is shipped as installed, you do not have to install a KAR file.
The supported big data steps in Pentaho include:
Both operating system file browsers and the Pentaho virtual file system browsers are supported, as well as basic HDFS and VFS operations. For more information, see Connecting to Virtual File Systems.
Note: Only Hadoop clusters that conform with standard Hadoop connection rules work with the Apache Hadoop Driver. While Hortonworks, Cloudera and
EMR clusters may work, MapR does not work with this driver because the connection rules for MapR are not standard. The Apache Hadoop Driver is not intended to support higher level Hadoop
operations such as Hive, HBase, Sqoop, and Oozie. If you require these operations, install the KAR file for the applicable vendor.