Advanced topics

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

The following topics help to extend your knowledge of PDI beyond basic setup and use:

  • PDI and Hitachi Content Platform (HCP)

    You can use PDI transformation steps to improve your HCP data quality before storing the data in other formats, such as JSON , XML, or Parquet.

  • PDI and Data Catalog

    You can use PDI transformation steps to read or write metadata to or from PDC. You can also build a transformation to create and describe a new data resource in PDC.

  • PDI and Snowflake

    Using PDI job entries for Snowflake, you can load your data into Snowflake and orchestrate warehouse operations.

  • Use Command Line Tools

    You can use PDI's command line tools to execute PDI content from outside of the PDI client.

  • Metadata Injection

    You can insert data from various sources into a transformation at runtime.

  • Use Carte Clusters

    You can use Carte to build a simple web server that allows you to run transformations and jobs remotely.

  • Use Adaptive Execution Layer (AEL)

    You can use AEL to run transformations in different execution engines.

  • Partition Data

    Split a data set into a number of sub-sets according to a rule that is applied on a row of data.

  • Use a Data Service

    Query the output of a step as if the data were stored in a physical table by turning a transformation into a data service.

  • Use the Marketplace

    Download, install, and share plugins developed by Pentaho and members of the user community.

  • Use Data Lineage

    Track your data from source systems to target applications and take advantage of third-party tools, such as Meta Integration Technology (MITI) and yEd, to track and view specific data.

  • Connecting to a Hadoop cluster with the PDI client

    Use transformation steps to connect to a variety of Big Data data sources, including Hadoop, NoSQL, and analytical databases such as MongoDB.

  • Use Streamlined Data Refinery (SDR)

    You can use SDR to build a simplified and specific ETL refinery composed of a series of PDI jobs that take raw data, augment and blend it through the request form, and then publish it to use in Analyzer.

Note: If you want to develop custom plugins that extend PDI functionality or embed the engine into your own Java applications, see Try Pentaho Data Integration and Analytics.