Input tab

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

Use this tab to make selections for moving data from PDI fields to Python variables. Decide if you want to process your data Row by row (standard PDI behavior) or All rows at once.

The All rows option is commonly used for data frames. A data frame is used for storing data tables and is composed of a list of vectors of equal length. Because data frames combine the behavior of lists and matrices, it is well-suited for the analytical needs of statistical data. For example, data scientists may want to bring in a training dataset before an actual dataset. The training dataset can contain multiple types of data which allows for a broader scope, without the need to join data ahead of time. Now the data scientist can operate with an entire set in the training data frame.

CAUTION:
When using the AEL engine for processing data Row by row or All rows, you can only have one input step in your transformation. If you include multiple input steps, the first step will be used and the subsequent steps will be ignored.

Selecting the Row by row option limits your input to only one type of data, limiting the record of data to a specific time and to what is being read. Selecting the All rows option broadens the depth and scope of your dataset.