Options tab

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15
ORC Output step Options tab

The following options in the Options tab define how the ORC output file will be created.

Field Description
Compression

Specifies which codec is used to compress the ORC output file:

None
No compression is used (default).
Zlib
Writes the data blocks using the deflate algorithm, as specified in RFC 1951, and typically implemented using the zlib library.
LZO
Writes the data blocks using LZO encoding, which works well for CHAR and VARCHAR columns that store very long character strings.
Snappy
Using Google's Snappy compression library, writes the data blocks that are followed by the 4-byte, big-endian CRC32 checksum of the uncompressed data in each block.
Stripe size (MB) Defines the stripe size in megabytes. An ORC file has one or more stripes. Each stripe is composed of rows of data, an index of the data, and a footer containing metadata about the stripe’s contents. Large stripe sizes enable efficient reads from HDFS. The default is 64.See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC for additional information.
Compress size (KB) Defines the number of kilobytes in each compression chunk. The default is 256.
Inline Indexes If checked, rows are indexed when written for faster filtering and random access on read.
Rows between entries Defines the stride size or number of rows between index entries (must be greater than or equal to 1000). The stride size is the block of data that can be skipped by the ORC reader during a read operation based on the indexes. The default is 10000.
Include date in file name Adds the system date to the filename with format (20181231 for example).
Include time in file name Adds the system time to the filename with format HHmmss (235959 for example).
Specify date time format Select to specify the date time format using the dropdown list.
Important: Due to licensing constraints, ORC does not ship with LZO compression libraries; these must be manually installed on each node if you want to use LZO compression.