Read Metadata

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15

Parent article

You can use the Read Metadata step to search and retrieve any existing metadata in the Pentaho Data Catalog that is associated with specific Data Catalog registered data resources.

Specifically, you could create a transformation that searches Data Catalog for existing metadata that points to data stored in CVS files and Parquet files stored in HDFS or Amazon S3. You can then pass all the associated metadata, including the location of the data, to other steps within your transformation for processing.

For example, you could use the Read Metadata to retrieve the metadata for a data file's cluster location and then pass the metadata to a Text File input step or a Catalog Input step that retrieves the file’s contents for an ETL operation on the data. The transformation can then write the new data contents back to the file or to a new file.

The Read Metadata step includes search options to identify, locate, and retrieve the metadata associated with the available data resources listed in Data Catalog .

For more information about accessing Pentaho Data Catalog in PDI, see PDI and Data Catalog.

Note: This step is supported on the PDI engine but not on the Spark engine. Only CSV text file and Parquet data formats are currently supported. You must have role permissions set in Data Catalog to read the data resources.