The Sort rows step includes options to define your output and how you want to sort rows.
Option | Description |
---|---|
Sort directory | Select a directory in which temporary files can be stored, if needed. If you leave this blank, the temporary files are stored in the default temporary directory for the system. Click Browse to select a different directory. |
TMP-file prefix | Specify a prefix for the temporary files. This helps you identify any files generated by the transformation in the temp directory. |
Sort size (rows in memory) | Specify the number of rows to sort in memory. A larger number improves the sort speed, since fewer temporary files are generated, consuming less input/output processing. Default is: 1000000 rows. Note: You may encounter an Out Of Memory Exception (OOME) if the number of rows in memory exceeds the number specified in this option. To resolve the OOME, either lower the sort size specified here or change your available memory. See the Install Pentaho Data Integration and Analytics document.
|
Free memory threshold (in %) | Specify a percentage number as the threshold. If the sort
algorithm has less available free memory than the indicated number, it will begin
paging data to disk. This percentage varies per individual production environment because the threshold is re-verified every 1000 rows. The row size, the complexity of the transformation, or other steps in the transformation could still lead to an Out Of Memory Error. In a JVM, the exact amount of free memory varies. As a best practice, use this step in less complex transformations that do not require several steps or several processes that are contending for memory. |
Compress TMP Files | Select this option to compress any temporary files that are generated to complete the sort. Clear this option to leave temporary files uncompressed. |
Only pass unique rows? (verifies keys only) | Select this option to pass only unique rows to the output stream(s). Clear this option to pass all rows to the output stream(s). |
Fields table | Specify the fields and direction, ascending or descending, to sort. You can also specify whether to perform a case-sensitive sort. Click the column titles to sort each column. |
Get Fields | Click Get fields to retrieve a list of all the incoming fields on the stream(s). |