The CSV File Input step reads data from delimited text files into a PDI transformation. While this step is called CSV File Input, you can also use CSV File Input with many other separator types, such as pipes, tabs, and semicolons.
Note: The semicolon (;) is set as the default separator type for this step.
Note: The options for this step are a subset of
the Text File Input step. This step differs from the Text File Input step in the following ways:
- NIO
- Non-blocking I/O is used for native system calls to read the file faster, but is limited to local files. It does not support VFS.
- Parallel Running
- If you configure this step to run in multiple copies (or in a clustered mode) and you enable parallel running, each copy will read a separate block of a single file. You can distribute the reading of a file to several threads or even several slave nodes in a clustered transformation.
- Lazy Conversion
- If you are reading many fields from a file and many of those fields will not be manipulated but merely passed through the transformation to land in some other text file or a database, lazy conversion can prevent PDI from performing unnecessary work on those fields (such as converting them into objects like strings, dates, or numbers).
An example of a simple CSV input transformation (CSV Input - Reading customer data.ktr) can be found in the data-integration/samples/transformations directory.