Options tab

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15
Parquet Output step Options tab

In the Options tab, you can define properties for the file output.

Option Description
Compression
Specify the codec to use to compress the Parquet Output file:
None
No compression is used (default).
Snappy
Using Google's Snappy compression library, writes the data blocks that are followed by the 4-byte, big-endian CRC32 checksum of the uncompressed data in each block.
GZIP
Uses a compression format that is based on the Deflate algorithm.
Version

Specify the version of Parquet you want to use:

  • Parquet 1.0
  • Parquet 2.0
Row group size (MB) Specify the group size for the rows. The default value is 0.
Data page size (KB) Specify the page size for the data. The default value is 0.
Dictionary encoding Specifies the dictionary encoding, which builds a dictionary of values encountered in a column. The dictionary page is written first, before the data pages of the column. Note that if the dictionary grows larger than the Page size, whether in size or number of distinct values, then the encoding method will revert to the plain encoding type.
Page size (KB) Specify the page size when using dictionary encoding. The default value is 1024.
Extension Select the extension for your output file. The default value is parquet.
Include date in file name Adds the system date to the filename with format yyyyMMdd (20181231 for example).
Include time in file name Adds the system time to the filename with format HHmmss (235959 for example).
Specify date time format Specify the date time format using the dropdown list.