Parquet Output

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

The Parquet Output step allows you to map PDI fields to fields within data files and choose where you want to process those files, such as on HDFS. For big data users, the Parquet Input and Parquet Output steps enable you to gather data from various sources and move that data into the Hadoop ecosystem in the Parquet format. Depending on your setup, you can execute the transformation within PDI, or within the Adaptive Execution Layer (AEL) using Spark as the processing engine.

Before using the Parquet Output step, you will need to configure a named connection for your distribution, even if your Location is set to Local. For information on named connections, see Connecting to a Hadoop cluster with the PDI client.