Content tab

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15


Content tab

In the Content tab, you can specify the format of the text files that are being read.
Option Description
Filetype Select either CSV or Fixed length. Based on this selection, the PDI client launches a different helper GUI when you click Get Fields in the Fields tab.
Separator One or more characters that separate the fields in a single line of text. Typically, this is a semicolon ( ; ) or tab.
Enclosure Some fields can be enclosed by a pair of strings to allow separator characters in fields. The enclosure string is optional.
Allow breaks in enclosed fields Not implemented.
Escape Specify an escape character (or characters) if you have these types of characters in your data. If you have a backslash ( / ) as an escape character, the text Not the nine o\'clock news (with a single quote \[ ' \] as the enclosure) is parsed as Not the nine o'clock news.
Header & Number of header lines Select if your text file has a header row (first lines in the file). You can specify the number of times the header line appears.
Footer & Number of footer lines Select if your text file has a footer row (last lines in the file). You can specify the number of times the footer row appears.
Wrapped lines & Number of times wrapped Select if you work with data lines that have wrapped beyond a specific page limit. Headers and footers are never considered wrapped.
Paged layout (printout), Number of lines per page, & Document header lines Use these options as a last resort when working with texts meant for printing on a line printer. Use the number of document header lines to skip introductory texts and the number of lines per page to position the data lines.
Compression Use this field if your text file is in a ZIP or GZIP archive. Only the first file in the archive is read.
No empty rows Select if you do not want to send empty rows to the next steps.
Include filename in output? Select if you want the file name to be part of the output.
Filename fieldname Enter the name of the field that contains the file name.
Rownum in output? Select if you want the row number to be part of the output.
Rownum fieldname & Rownum by file? Enter the name of the field that contains the row number.
Format Can be either DOS, UNIX, or mixed. UNIX files have lines that are terminated by line feeds. DOS files have lines separated by carriage returns and line feeds. If you specify mixed, no verification is done.
Encoding & Limit Specify the text file encoding to use. Leave blank to use the default encoding on your system. To use Unicode, specify UTF-8 or UTF-16. On first use, the PDI client searches your system for available encodings.
Be lenient when parsing dates? Clear check box if you want strict parsing of data fields. If selected, dates like Jan 32nd become Feb 1st.
The date format Locale This locale is used to parse dates that have been written in full such as February 2nd, 2016. Parsing this date on a system running in the French (fr_FR) locale would not work because February is called Février in that locale.
Add filenames to result Adds filenames to generate a filenames list.