The following options in the Options tab define how the ORC output file will be created.
Field | Description |
---|---|
Compression |
Specifies which codec is used to compress the ORC output file:
|
Stripe size (MB) | Defines the stripe size in megabytes. An ORC file has one or more stripes. Each stripe is composed of rows of data, an index of the data, and a footer containing metadata about the stripe’s contents. Large stripe sizes enable efficient reads from HDFS. The default is 64.See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC for additional information. |
Compress size (KB) | Defines the number of kilobytes in each compression chunk. The default is 256. |
Inline Indexes | If checked, rows are indexed when written for faster filtering and random access on read. |
Rows between entries | Defines the stride size or number of rows between index entries (must be greater than or equal to 1000). The stride size is the block of data that can be skipped by the ORC reader during a read operation based on the indexes. The default is 10000. |
Include date in file name | Adds the system date to the filename with format (20181231 for example). |
Include time in file name | Adds the system time to the filename with format HHmmss (235959 for example). |
Specify date time format | Select to specify the date time format using the dropdown list. |
Important: Due to licensing constraints, ORC does not
ship with LZO compression libraries; these must be manually installed on each node if you
want to use LZO compression.