When you want to access, process, and analyze large datasets, the following PDI transformation steps are coded for Spark to work well with big data technologies.
- Abort
- AMQP Consumer
- Avro Input
- Avro Output
- Copy rows to result
- Dummy (do nothing)
- ETL Metadata Injection
- Filter Rows
- Get records from stream
- Get rows from result
- Group By
- Hadoop File Input
- Hadoop File Output
- HBase Input
- HBase Output
- Java filter
- Join Rows (Cartesian product)
- Kafka Consumer
- Mapping (Sub-transformation)
- Mapping Input Specification
- Mapping Output Specification
- Memory Group By
- Merge Join
- Merge Rows (diff)
- MQTT Consumer
- ORC Input
- ORC Output
- Parquet Input
- Parquet Output
- Simple Mapping
- Sort rows
- Stream Lookup
- Switch / Case
- Table Input
- Table Output
- Text File Input
- Text File Output
- Transformation Executor
- Unique Rows
- Unique Rows (HashSet)
- Write to Log