Content tab

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15


Content tab

In the Content tab, using the following options, you can specify the format of the source files.

Option Description
Filetype Select either CSV or Fixed length. Depending on the file type you select, a corresponding interface appears when you click Get Fields in the Fields tab.
Separator Specify the character used to separate the fields in a single line of text, typically a semicolon or tab. Click Insert Tab to place a tab in the Separator field. The default value is semicolon (;).
Enclosure Specify an optional character used to enclose a field if that field contains a separator character. The default value is double quotation marks (").
Allow breaks in enclosed fields Not implemented.
Escape Specify one or more characters to indicate if another character is a part of a regular text. For example, if a backslash (\) is the escape character and a single quote (') is an enclosure or separator character, then the text Not the nine o\’clock news is parsed as Not the nine o’clock news.
Header Select if your text file has a header row (first lines in the file). You can use Number of header lines to specify the number of times the header line appears.
Footer Select if your text file has a footer row (last lines in the file). You can use Number of footer lines to specify the number of times the footer row appears.
Wrapped lines Select if you work with data lines that have wrapped beyond a specific page limit. You can use Number of times wrapped to specify the number of times the line is wrapped. Headers and footers are never considered wrapped.
Paged layout (printout) Select when other text handling options (above) fail on a text file designed to be output to a line printer. You can use Document header lines to skip introductory texts and Number of lines per page to position the data lines.
Compression Select if your text file is in a ZIP or GZip archive. Only the first file in the archive is read.
No empty rows Select if you do not want to send empty rows to the next steps.
Include filename in output Select if you want the file name to be part of the output, and use Filename fieldname to enter the name of the field that contains the file name.
Rownum in output Select if you want the row number to be part of the output. You can use Rownum fieldname to enter the name of the field that contains the row number. Select Rownum by file if you want to allow the row number to be reset per file.
Format Select the file format, which can be either DOS, UNIX, or mixed. UNIX files have lines terminated by line feeds. DOS files have lines separated by carriage returns and line feeds. If you specify mixed, no verification is done.
Encoding Select the text file encoding to use. Leave blank to use the default encoding on your system. To use Unicode, specify UTF-8 or UTF-16. On first use, the PDI client searches your system for available encodings.
Length
Select the length of the field according to its type:
  • Characters
  • Bytes
Limit Specify a limit on the number of records generated from this step. Specify zero (0) for an unlimited number of records.
Be lenient when parsing dates? Clear the check box if you want strict parsing of data fields. If selected, dates like Jan 32nd become Feb 1st.
The date format Locale Specify the locale to use to parse dates written in full, such as February 2nd, 2006. For example, parsing February 2nd, 2006, on a system set to French (fr_FR) would not work because February is called Février in that locale.
Add filenames to result Select to add file names to a resulting list of file names.