Using the Avro Input step on the Spark engine

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

You can set up the Avro Input step to run on the Spark engine. Spark processes null values differently than the Pentaho engine, so you may need to adjust your transformation to successfully process null values according to Spark's processing rules.

Additionally, when using the Avro Input step on an Amazon EMR cluster, you must copy the spark-avro_2.11-2.4.2.jar file from your SPARK_HOME folder into the extra folder in your AEL data-integration setup location. The following is an example command to copy the file:
cp /usr/lib/spark/external/lib/spark-avro_2.11-2.4.2.jar <User>/data-integration/adaptive-execution/extra/