Options

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

CSV Input step

The CSV File Input step has the following options:

Option Description
Step name Specify the unique name of the CSV File Input step on the canvas. You can customize the name or leave it as the default.
Filename Specify the name of the input CSV file or navigate to the input file by clicking Browse.

If your source is from a previous step, the Browse button is hidden. Use the drop-down menu in the text box to select the field to use as the name or names of your CSV file(s).

Include the filename in the output? (Only appears if your source is from a previous step) If your source is from a previous step, select if you want the name of the input source file included in the output.
Delimiter Specify the file delimiter character used in the source file. Special characters (for example, CHAR HEX01) can be set with the format $[value]. For example, $[01] or $[6F,FF,00,1F].

The default delimiter for the CSV File Input step is a semicolon ;.

Enclosure Specify the enclosure character used in the source file. Special characters (for example, CHAR HEX01) can be set with the format $[value], such as $[01] or $[6F,FF,00,1F].
NIO buffer size Specify the size of the read buffer, the number of bytes that is read at one time from the source.
Lazy conversion? Indicate if the lazy conversion algorithm may be used to improve performance. The lazy conversion algorithm tries to avoid unnecessary data type conversions if possible. It can result in significant performance improvements. The typical example is reading from a text file and writing back to a text file.
Header row present? Indicate if the source file contains a header row containing column names.
Add filename to result Adds the CSV source filename(s) to the result of this transformation.
The row number field name (optional) Specify the name of the field that will contain the row number in the output of this step.
Running in parallel? Indicate if you will have multiple instances of this step running (step copies) and if you want each instance to read a separate part of the CSV file(s).

When reading multiple files, the total size of all files is taken into consideration to split the workload. In that specific case, make sure that ALL step copies receive all files that need to be read, otherwise, the parallel algorithm will not work correctly.

Caution: For technical reasons, parallel reading of CSV files is only supported on files that do not have fields with line breaks or carriage returns in them.

New line possible in fields? Indicate if data fields may contain new line characters.
Format Select the file format, which can be either DOS, UNIX, or mixed. UNIX files have lines terminated by line feeds. DOS files have lines separated by carriage returns and line feeds. If you specify mixed, no verification is done
File encoding Specify the encoding of the source file.