Using the ORC Input step on the Spark engine

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

You can set up the ORC Input step to run on the Spark engine. Spark processes null values differently than the Pentaho engine, so you may need to adjust your transformation to successfully process null values according to Spark's processing rules.

Because of Cloudera Distribution Spark (CDS) limitations, the step does not support AEL for reading Hive tables containing data files in the ORC format from Spark applications in YARN mode. As an alternative, you can use the Parquet data format for columnar data using Impala.