Pentaho Data Integration
Kettle
Get started with the PDI client
Starting the PDI client
If you used the Pentaho Installation Wizard to install Pentaho
If you used an archive or manual installation to install Pentaho
Use the PDI client perspectives
Data Integration perspective
Schedule perspective
Customize the PDI client
Use a Pentaho Repository in PDI
Get started with a Pentaho Repository
Create a connection in the PDI client
Connect to a Pentaho Repository
Manage repositories in the PDI client
Repository Manager
Connection details
Unsupported repositories
Database repository
File Repository
Use the Repository Explorer
Access the Repository Explorer window
With LDAP authentication, the PDI Repository Explorer is empty
Create a new folder in the repository
Open a folder, job, or transformation
Rename a folder, job, or transformation
Move objects
Restore objects
Delete a folder, job, or transformation
Use Pentaho Repository access control
Lock and unlock jobs and transformations
Lock a job or transformation
View lock notes
Unlock a job or transformation
Access connection, security, and cluster information
Set folder-level permissions
Use version history
Use version history
Open a version of a file
Restore a version of a file
Enable or disable tracking of version history and comments
Advanced topics
Data Integration perspective in the PDI client
Basic concepts of PDI
Transformations
Jobs
Hops
PDI client options
Work with transformations
Create a transformation
Open a transformation
On your local machine
In the Pentaho Repository
On Virtual File Systems
Save a transformation
On your local machine
In the Pentaho Repository
On Virtual File Systems
Run your transformation
Run configurations
Select an Engine
Pentaho Engine
Spark Engine
Options
Parameters and Variables
Analyze your transformation results
Step metrics
Logging
Execution history
Performance graph
Metrics
Preview data
Inspect data
Inspect your data
Get started
Tour the environment
Use visualizations
Save your inspection session
Use tabs to create multiple visualizations
Visualization types
Use filters to explore your data
Drill down into your visualization
Keep or exclude selected data in your visualization
Add a filter using the Filters panel
Remove a filter from the Filters panel
Filter examples
Work with filters in Stream View
Use the Select or Exclude filters
Select filter example
Work with filters in Model View
Filter functions
Keyboard shortcuts for filter options
Publish for collaboration
Stop your transformation
Use the Transformation menu
Adjust transformation properties
Work with jobs
Create a job
Open a job
On your local machine
In the Pentaho Repository
On Virtual File Systems
Save a job
On your local machine
In the Pentaho Repository
On Virtual File Systems
Run your job
Run configurations
Pentaho engine
Options
Parameters and variables
Stop your job
Use the Job menu
Adjust job properties
Add notes to transformations and jobs
Create a note
Edit a note
Reposition a note
Delete a note
Adaptive Execution Layer
Connecting to Virtual File Systems
Before you begin
Access to Google Cloud
Access to HCP
Access to Pentaho Data Catalog
Access to Microsoft Azure
Create a VFS connection
Edit a VFS connection
Delete a VFS connection
Access files with a VFS connection
Pentaho address to a VFS connection
Steps and entries supporting VFS connections
VFS browser
Before you begin
Access to a Google Drive
Set up HCP credentials
Access files with the VFS browser
Supported steps and entries
Configure VFS options
Logging and performance monitoring
Set up transformation logging
Set up job logging
Logging levels
Monitor performance
Sniff Test tool
Monitoring tab
Use performance graphs
PDI performance tuning tips
Logging best practices
Advanced topics
Understanding PDI data types and field metadata
Data type mappings
Using the correct data type for math operations
Using the fields table properties
Applying formatting
Applying calculations and rounding
Output type examples
PDI run modifiers
Arguments
Parameters
VFS properties
Specifying VFS properties as parameters
Configure SFTP VFS
Variables
Environment variables
Kettle Variables
Set Kettle variables in the PDI client
Set Kettle variables manually
Set Kettle or Java environment variables in the Pentaho MapReduce job entry
Set the LAZY_REPOSITORY variable in the PDI client
Internal variables
Use checkpoints to restart jobs
Add a checkpoint
Delete a checkpoint
Set up a checkpoint log
Use the SQL Editor
Use the Database Explorer
Transactional databases and job rollback
Make a transformation database transactional
Make a job database transactional
Web services steps
Schedule perspective in the PDI client
Schedule a transformation or job
Edit a scheduled run of a transformation or job
Stop a schedule from running
Enable or disable a schedule from running
Delete a scheduled run of a transformation or job
Refresh the schedule list
Streaming analytics
Get started with streaming analytics in PDI
Data ingestion
Data processing
Advanced topics
PDI and Hitachi Content Platform (HCP)
PDI and Data Catalog
Prerequisites
Supported Filetypes
PDI and Snowflake
Snowflake job entries in PDI
Copybook steps in PDI
Copybook transformation steps in PDI
Work with the Streamlined Data Refinery
How does SDR work?
App Builder, CDE, and CTools
Get started with App Builder
Community Dashboard Editor and CTools
Install and configure the Streamlined Data Refinery
Installing and configuring the SDR sample
Install Pentaho software
Download and install the SDR sample
Configure KTR files for your environment
Clean up the All Requests Processed list
Install the Vertica JDBC driver
Use Hadoop with the SDR
App endpoints for SDR forms
App Builder and Community Dashboard Editor
Get started with App Builder
Community Dashboard Editor and CTools
Use the Streamlined Data Refinery
How to use the SDR sample form
Edit the Movie Ratings - SDR Sample form
Building blocks for the SDR
Use the Build Model job entry for SDR
Create a Build Model job entry
Select existing model options
Variables for Build Model job entry
Using the Annotate Stream step
Use the Annotate Stream step
Creating measures on stream fields
Create a measure on a stream field
Creating attributes
Create an attribute on a field
Creating link dimensions
Create a link dimension
Create a dimension key
Creating annotation groups
Create an annotation group for sharing with other users
Create an annotation group locally
Metadata injection support
Using the Shared Dimension step for SDR
Create a shared dimension
Create a dimension key in Shared Dimension step
Metadata injection support
Using the Publish Model job entry for SDR
Use the Publish Model job entry
Use Command Line Tools to Run Transformations and Jobs
Pan Options and Syntax
Pan Status Codes
Kitchen Options and Syntax
Kitchen Status Codes
Import KJB or KTR Files From a Zip Archive
Connect to a Repository with Command-Line Tools
Export Content from Repositories with Command-Line Tools
Using Pan and Kitchen with a Hadoop cluster
Using the PDI client
Using the Pentaho Server
Use Carte Clusters
About Carte Clusters
Set Up a Carte Cluster
Carte Cluster Configuration
Configure a static Carte cluster
Configure a Dynamic Carte Cluster
Configure a Carte Master Server
Configure Carte Slave Servers
Tuning Options
Configuring Carte Servers for SSL
Change Jetty Server Parameters
In the Carte Configuration file
In the Kettle Configuration file
Initialize Slave Servers
Create a cluster schema
Run transformations in a cluster
Schedule Jobs to Run on a Remote Carte Server
Stop Carte from the Command Line Interface or URL
Run Transformations and Jobs from the Repository on the Carte Server
Connecting to a Hadoop cluster with the PDI client
Audience and prerequisites
Using the Apache Hadoop driver
Install a driver for the PDI client
Adding a cluster connection
Add a cluster connection by import
Add a cluster connection manually
Add security to cluster connections
Specify Kerberos security
Specify Knox security
Configure and test connection
Managing Hadoop cluster connections
Edit Hadoop cluster connections
Duplicate a Hadoop cluster connection
Delete a Hadoop cluster connection
Connect other Pentaho components to a cluster
Adaptive Execution Layer
Set Up AEL
Use AEL
Recommended PDI steps to use with Spark on AEL
Vendor-specific setups for Spark
AEL logging
Activating AEL logging
Configuring AEL logging
Modify the XML file
Advanced topics
Troubleshooting
Partitioning data
Get started
Partitioning during data processing
Understand repartitioning logic
Partitioning data over tables
Use partitioning
Use data swimlanes
Rules for partitioning
Partitioning clustered transformations
Learn more
Pentaho Data Services
Creating a regular or streaming Pentaho Data Service
Data service badge
Open or edit a Pentaho Data Service
Delete a Pentaho Data Service
Test a Pentaho Data Service
Run a basic test
Run a streaming optimization test
Run an optimization test
Examine test results
Pentaho Data Service SQL support reference and other development considerations
Supported SQL literals
Supported SQL clauses
Other development considerations
Optimize a Pentaho Data Service
Apply the service cache optimization
How the service cache optimization technique works
Adjust the cache duration
Disable the cache
Clear the cache
Apply a query pushdown optimization
How the query pushdown optimization technique works
Add the query pushdown parameter to the Table Input or MongoDB Input steps
Set up query pushdown parameter optimization
Disable the query pushdown optimization
Apply a parameter pushdown optimization
How the parameter pushdown optimization technique works
Add the parameter pushdown parameter to the step
Set up parameter pushdown optimization
Apply streaming optimization
How the streaming optimization technique works
Adjust the row or time limits
Publish a Pentaho Data Service
Share a Pentaho Data Service with others
Share a Pentaho Data Service with others
Connect to the Pentaho Data Service from a Pentaho tool
Connect to the Pentaho Data Service from a Non-Pentaho tool
Step 1: Download the Pentaho Data Service JDBC driver
Download using the PDI client
Download manually
Step 2: Install the Pentaho Data Service JDBC driver
Step 3: Create a connection from a non-Pentaho tool
Query a Pentaho Data Service
Example
Monitor a Pentaho Data Service
Data lineage
Sample use cases
Architecture
Setup
API
Steps and entries with custom data lineage analyzers
Contribute additional step and job entry analyzers to the Pentaho Metaverse
Examples
Create a new Maven project
Add dependencies
Create a class which implements IStepAnalyzer
Create the Blueprint configuration
Build and test your bundle
See it in action
Different types of step analyzers
Field manipulation
External resource
Connection-based external resource
Adding analyzers from existing PDI plug-ins (non-OSGi)
Use the Pentaho Marketplace to manage plugins
View installed plugins and versions
Install plugins
Customize PDI Data Explorer
Use discrete axis for line, area, and scatter charts
Set discrete axes for time dimensions in PDI Data Explorer
Set discrete axes for number dimensions in PDI Data Explorer
Troubleshooting possible data integration issues
Troubleshooting transformation steps and job entries
'Missing plugins' error when a transformation or job is opened
Cannot execute or modify a transformation or job
Step is already on canvas error
Troubleshooting database connections
Unsupported databases
Database locks when reading and updating from a single table
Force PDI to use DATE instead of TIMESTAMP in Parameterized SQL queries
PDI does not recognize changes made to a table
Jobs scheduled on Pentaho Server cannot execute transformation on remote Carte server
Cannot run a job in a repository on a Carte instance from another job
Troubleshoot Pentaho data service issues
Kitchen and Pan cannot read files from a ZIP export
Using ODBC
Improving performance when writing multiple files
Snowflake timeout errors
Log table data is not deleted
Data Catalog searches returning incomplete or missing data
PDI transformation steps
Abort
General
Options
Logging
Add a Checksum
Options
Example
Metadata injection support
Add sequence
General
Database generated sequence
PDI transformation counter generated sequence
AMQP Consumer
Before You begin
General
Create and save a new child transformation
Options
Setup tab
Create a new AMQP Message Queue
Use an existing AMQP Message Queue
Specify Routing Keys
Specify Headers
Security tab
Batch tab
Fields tab
Result Fields tab
Metadata injection support
See also
AMQP Producer
Before you begin
General
Options
Setup tab
Security tab
Metadata injection support
See also
Avro Input
Select an engine
Using the Avro Input step on the Pentaho engine
General
Options
Source tab
Embedded schema
Separate schema
Avro Fields tab
Lookup Fields tab
Sample transformation walkthrough using the Lookup field
Metadata injection support
Using the Avro Input step on the Spark engine
General
Options
Source tab
Embedded schema
Separate schema
Avro Fields tab
Lookup Fields tab
Sample transformation walkthrough using the Lookup field
Metadata injection support
Avro Output
Select an engine
Using the Avro Output step on the Pentaho engine
General
Options
Fields tab
Schema tab
Options tab
Metadata injection support
Using the Avro Output step on the Spark engine
General
Options
Fields tab
Schema tab
Options tab
Metadata injection support
Calculator
General
Options
Calculator functions list
Troubleshooting the Calculator step
Length and precision
Data Types
Rounding method for the Round (A, B) function
Cassandra Input
AEL considerations
Options
CQL SELECT query
WHERE Clause
Metadata injection support
Cassandra Output
AEL Considerations
General
Options
Connection tab
Write options tab
Schema options tab
Update table metadata
Pre-Insert CQL
Metadata injection support
Catalog Input
Before you begin
General
Input tab
Fields tab
CSV fields
Parquet fields
PDI types
Catalog Output
Before you begin
General
File tab
Metadata tab
Fields tab
CSV fields
Parquet fields
Options tab
CSV options
Parquet options
Common Formats
Date formats
Number formats
Copybook Input
Before you begin
General
Options
Input tab
Output tab
Options tab
Use Error Handling
Metadata injection support
CouchDB Input
AEL Considerations
Options
Metadata injection support
CSV File Input
Options
Fields
Metadata injection support
Data types
Delete
General
The key(s) to look up the value(s) table
Metadata injection support
ElasticSearch Bulk Insert (deprecated)
Before you begin
General
Options
General tab
Servers tab
Fields tab
Settings tab
Reference information
Elasticsearch REST Bulk Insert
Before you begin
General
Options
General tab
Document tab
Creating a document to index with stream field data
Using an existing JSON document from a field
Output tab
ETL metadata injection
General
Options
Inject Metadata tab
Specify the source field
Injecting metadata into the ETL Metadata Injection step
Options tab
Example
Input data
Transformations
Results
Reference links
Articles
Video
Steps supporting metadata injection
Execute Row SQL Script
General
Output fields
Metadata injection support
Execute SQL Script
Notes
General
Options
Optional statistic fields
Example
Metadata injection support
File exists (Step)
Get records from stream
General
Options
Metadata injection support
See also
Get rows from result
General
Options
Metadata injection support
Get System Info
General
Data types
Metadata injection support
Group By
Select an engine
Using the Group By step on the Pentaho engine
General
The fields that make up the group table
Aggregates table
Examples
Metadata Injection Support
Using the Group By step on the Spark engine
General
The fields that make up the group table
Aggregates table
Examples
Metadata Injection Support
Hadoop File Input
Select an engine
Using the Hadoop File Input step on the Pentaho engine
General
Options
File tab
Accepting file names from a previous step
Show action buttons
Selecting a file using regular expressions
Open file
Content tab
Error Handling tab
Filters tab
Fields tab
Number formats
Scientific notation
Date formats
Metadata injection support
Using the Hadoop File Input step on the Spark engine
General
Options
File tab
Accepting file names from a previous step
Show action buttons
Selecting a file using regular expressions
Open file
Content tab
Error Handling tab
Filters tab
Fields tab
Number formats
Scientific notation
Date formats
Metadata injection support
Hadoop File Output
Select an Engine
Using the Hadoop File Output step on the Pentaho engine
General
Options
File tab
Content tab
Fields tab
Metadata injection support
Using the Hadoop File Output step on the Spark engine
General
Options
File tab
Content tab
Fields tab
Metadata injection support
HBase Input
Select an engine
Using the HBase Input step on the Pentaho engine
General
Options
Configure query tab
Key fields table
Create/Edit mappings tab
Fields
Additional notes on data types
Filter result set tab
Fields
Namespaces
Performance considerations
Metadata injection support
Using the HBase Input step on the Spark engine
General
Options
Configure query tab
Key fields table
Create/Edit mappings tab
Fields
Additional notes on data types
Filter result set tab
Fields
Metadata injection support
HBase Output
Select an Engine
Using the HBase Output step on the Pentaho engine
General
Options
Configure connection tab
Create/Edit mappings tab
Performance considerations
Metadata injection support
Using the HBase Output step on the Spark engine
General
Options
Configure connection tab
Create/Edit mappings tab
Performance considerations
Metadata injection support
HBase row decoder
General
Options
Configure fields tab
Create/Edit mappings tab
Key fields table
Additional notes on data types
Using HBase Row Decoder with Pentaho MapReduce
Metadata injection support
HBase setup for Spark
Set up the application properties file
Set up the vendor-specified JARs
Using HBase steps with Amazon EMR 5.21
Specify the parameter in the properties file
Specify the parameter in Transformation properties
Specify the parameter as an environment variable in PDI
Java filter
General
Options
Filter expression examples
JMS Consumer
Before you begin
General
Create and save a new child transformation
Options
Setup tab
Security tab
Batch tab
Fields tab
Result fields tab
Metadata injection support
See also
JMS Producer
Before you begin
General
JMS connection information
Setup tab
Security tab
Options tab
Properties tab
Metadata injection support
See also
Job Executor
Samples
General
Options
Parameters tab
Execution results tab
Row grouping tab
Results rows tab
Result files tab
JSON Input
General
Options
File tab
Selected files table
Content tab
Fields tab
Select fields
Additional output fields tab
Examples
Metadata injection support
Kafka consumer
AEL considerations
General
Create and save a new child transformation
Options
Setup tab
Batch tab
Fields tab
Result fields tab
Options tab
Metadata injection support
See also
Kafka Producer
General
Options
Setup tab
Options tab
Metadata injection support
See also
Kinesis Consumer
AEL considerations
General
Create and save a new child transformation
Options
Setup tab
Batch tab
Fields tab
Result fields tab
Options tab
Metadata injection support
See also
Kinesis Producer
AEL considerations
General
Options
Setup tab
Options tab
Metadata injection support
See also
Mapping
General
Log lines in Kettle
Options
Parameters tab
Input tab
Add inputs to table
Output tab
Mapping Input Specification
Options
Mapping Output Specification
Options
Samples
MapReduce Input
AEL considerations
Options
Metadata injection support
MapReduce Output
AEL Considerations
Options
Metadata injection support
Memory Group By
General
The fields that make up the Group Table
Aggregates table
Metadata injection support
Merge rows (diff)
Select an Engine
Using Merge rows (diff) on the Pentaho engine
General
Options
Examples
Metadata injection support
Using Merge rows (diff) on the Spark engine
General
Options
Examples
Metadata injection support
Microsoft Excel Input
General
Options
Files tab
Selected files table
Sheets tab
Content tab
Error Handling tab
Fields tab
Additional output fields tab
Metadata injection support
Microsoft Excel Output
Options
Content tab
Custom tab
Header font
Row font
Fields tab
Metadata injection support
Microsoft Excel Writer
General
Options
File & Sheet tab
File panel
Sheet panel
Template panel
Content Tab
Content options panel
When writing to existing sheet panel
Fields panel
Metadata injection support
Modified Java Script Value
General
Java script functions pane
Java Script pane
Script types
Fields table
Modify values
JavaScript Internal API Objects
Examples
Check for the existence of fields in a row
Add a new field in a row
Use NVL in JavaScript
Split fields
Comparing values
String values
Numeric values
Filter rows
Sample transformations
Mondrian Input
General
MongoDB Input
AEL considerations
General
Options
Configure connection tab
Input options tab
Tag set specification table
Query tab
Fields tab
Examples
Query expression
Aggregate pipeline
Metadata injection support
MongoDB Output
AEL considerations
General
Options
Configure connection tab
Output options tab
Mongo document fields tab
Example
Input data
Document field definitions
Document structure
Create/drop indexes tab
Create/drop indexes example
Metadata injection support
MQTT Consumer
Select an engine
Using the MQTT Consumer step on the Pentaho engine
General
Create and save a new child transformation
Options
Setup tab
Security tab
Batch Tab
Fields tab
Result fields tab
Options tab
Metadata injection support
Using the MQTT Consumer step on the Spark engine
General
Create and save a new child transformation
Options
Setup tab
Security tab
Batch tab
Fields tab
Result fields tab
Options tab
Using MQTT with SSL on AEL
Metadata injection support
MQTT Producer
General
Options
Setup tab
Security tab
Options tab
Metadata injection support
See also
ORC Input
Select an engine
Using the ORC Input step on the Pentaho engine
Options
Fields
ORC types
Metadata injection support
Using the ORC Input step on the Spark engine
Options
Fields
AEL types
Metadata injection support
ORC Output
Select an Engine
Using the ORC Output step on the Pentaho engine
General
Options
Fields tab
ORC types
Options tab
Metadata injection support
Using the ORC Output step on the Spark engine
General
Options
Fields tab
AEL types
Options tab
Metadata injection support
Parquet Input
Select an Engine
Using Parquet Input on the Pentaho engine
General
Fields
***
PDI types
Metadata injection support
Using Parquet Input on the Spark engine
General
Fields
Using Get Fields with Parquet partitioned datasets
Spark types
Metadata injection support
Parquet Output
AEL considerations
General
Options
Fields tab
Options tab
Metadata injection support
Pentaho Reporting Output
General
Metadata injection support
Python Executor
Before you begin
General
Options
Script tab
Source panel
Input tab
Row by row processing
All rows processing
Mapping data types from PDI to Python
Output tab
Variable to fields processing
Frames to fields processing
Mapping data types from Python to PDI
Query HCP
Before you begin
General
Options
Query tab
Output tab
See also
Read metadata from HCP
General
Options
See also
Read metadata from Copybook
General
Example
Metadata injection support
Read Metadata
Before you begin
General
Options
Specific Resources
Search Criteria
Advanced Search
Regex Evaluation
General
Capture Group Fields table
Options
Settings tab
Regular expression evaluation window
Content tab
Examples
Replace in String
General
Fields string table
Example: Using regular expression group references
Metadata injection support
See also
REST Client
General
Options
General tab
Authentication tab
SSL tab
Headers tab
Parameters tab
Matrix Parameters tab
Row Denormaliser
General
Group field table
Target fields table
Examples
Metadata injection support
Row Flattener
General
Example
Row Normaliser
General
Fields table
Examples
Metadata injection support
S3 CSV Input
Options
Fields
AWS credentials
Metadata injection support
See also
S3 File Output
Big Data warning
General
Options
File tab
Content tab
Fields tab
AWS credentials
Metadata injection support
See also
Salesforce Delete
General
Options
Connection
Settings
Salesforce Input
General
Options
Settings tab
Connection
Settings
Content tab
Advanced
Additional fields
Other Fields
Fields tab
Metadata injection support
Salesforce Insert
General
Options
Connection
Settings
Output Fields
Fields
Salesforce Update
General
Options
Connection
Settings
Fields
Salesforce Upsert
General
Options
Connection
Settings
Output Fields
Fields
Select Values
General
Options
Select & Alter tab
Edit Mapping
Remove tab
Meta-data tab
Examples
Metadata injection support
Set Field Value
General
Options
Metadata injection support
Set Field Value to a Constant
General
Options
Metadata Injection Support
Simple Mapping (sub-transformation)
General
Log lines in Kettle
Options
Parameters tab
Input tab
Output tab
Single Threader
General
Options
Options tab
Parameters tab
Sort rows
General
Options
Fields column settings
Metadata injection support
Split Fields
General
Fields table
Example
Metadata injection support
Splunk Input
Prerequisites
AEL considerations
General
Options
Connection tab
Fields tab
Raw field parsing
Date handling
Metadata injection support
Splunk Output
Prerequisites
AEL considerations
General
Options
Connection tab
Event tab
Metadata injection support
SSTable Output
AEL considerations
Options
String Operations
General
The fields to process
Metadata injection support
Strings cut
General
The fields to cut
Example
Metadata injection support
Switch-Case
Options
Example
Metadata injection support
Table Input
AEL considerations
Connect to a Hive database
Connect to an Impala database
General
Options
Example
Metadata injection support
Table Output
AEL considerations
Connect to a Hive database
General
Options
Main options tab
Database fields tab
Enter Mapping window
Metadata injection support
Using Table input to Table output steps with AEL for managed tables in Hive
Create separate input and output KTRs
Create a job to join the KTRs
Text File Input
Select an engine
Using the Text File Input step on the Pentaho engine
General
Options
File tab
Regular expressions
Selected files table
Accept file names
Show action buttons
Content tab
Error Handling tab
Filters tab
Fields tab
Number formats
Scientific notation
Date formats
Additional output fields tab
Metadata injection support
Using the Text File Input step on the Spark engine
General
Options
File tab
Regular expressions
Selected files table
Accept file names
Show action buttons
Content tab
Error Handling tab
Filters tab
Fields tab
Number formats
Scientific notation
Date formats
Additional output fields tab
Metadata injection support
Text File Output
Select an engine
Text File Output
General
Options
File tab
Content tab
Fields tab
See also
Metadata injection support
Using the Text File Output step on the Spark engine
General
Options
File tab
Content tab
Fields tab
See also
Metadata injection support
Transformation Executor
Error handling and parent transformation logging notes
Samples
General
Options
Parameters tab
Order of processing
Execution results tab
Row grouping tab
Result rows tab
Result files tab
Unique Rows
Select an engine
Using the Unique Rows step on the Pentaho engine
General
Settings
See also
Using the Unique Rows step on the Spark engine
General
Settings
See also
Unique Rows (HashSet)
General
Settings
See also
User Defined Java Class
Not complete Java
General (User Defined Java Class)
Class Code (User Defined Java Class)
Process rows
Error handling
Logging
Class and code fragments
Options
Fields tab
Parameters tab
Info steps tab
Target steps tab
Examples
Metadata injection support
Write metadata to HCP
General
Options
See also
Write Metadata
Before you begin
General
Options
Input tab
Metadata tab
XML Input Stream (StAX)
Samples
Options
Element blocks example
XML Output
General
Options
File tab
Content tab
Fields tab
Metadata injection support
PDI job entries
Amazon EMR Job Executor
Before you begin
General
Options
EMR settings tab
AWS connection
Cluster
Job settings tab
Amazon Hive Job Executor
Before you begin
General
Options
Hive settings tab
AWS connection
Cluster
Job settings tab
Bulk load into Amazon Redshift
Before you begin
General
Options
Input tab
Output tab
Options tab
Parameters tab
Bulk load into Azure SQL DB
Before you begin
General
Options
Input tab
Output tab
Options tab
Advanced options tab
Bulk load into Databricks
General
Options
Input tab
Output tab
Bulk load into Snowflake
Before you begin
General
Options
Input tab
Output tab
Options tab
Advanced options tab
Create Snowflake warehouse
General
Options
Database connection and warehouse
Warehouse settings
Cluster settings
Activity settings
Delete Snowflake warehouse
General
Options
File Exists (Job Entry)
Google BigQuery loader
Before you begin
General
Options
Setup tab
File tab
Hadoop Copy Files
General
Options
Files/Folders tab
Settings tab
Job (job entry)
General
Options
Options tab
Logging tab
Argument tab
Parameters tab
Kafka Offset
General
Options
Setup tab
Options tab
Offset Settings tab
Examples
Modify Snowflake warehouse
General
Options
Database connection and warehouse
Warehouse settings
Cluster settings
Activity settings
Pentaho MapReduce
General
Options
Mapper tab
Combiner tab
Reducer tab
Job Setup tab
Cluster tab
Hadoop cluster configuration
User Defined tab
Use PDI outside and inside the Hadoop cluster
Pentaho MapReduce workflow
PDI Transformation
PDI Job
PDI Hadoop job workflow
Hadoop to PDI data type conversion
Hadoop Hive-specific SQL limitations
Big data tutorials
Spark Submit
Before you begin
Install and configure Spark client for PDI use
Spark version 2.x.x
General
Options
Files tab
Java or Scala
Python
Arguments tab
Options tab
Troubleshooting your configuration
Running a Spark job from a Windows machine
Sqoop Export
General
Quick Setup mode
Advanced Options mode
Sqoop Import
General (Sqoop Import)
Quick Setup mode
Advanced Options mode
Start Snowflake warehouse
General
Options
Stop Snowflake warehouse
General
Options
Transformation (job entry)
General
Options
Options tab
Logging tab
Arguments tab
Parameters tab
Step Name : Specifies the unique name of the
Get Rows from Result transformation step on the
canvas. You can customize the name or leave it as the default.