You can use the Read Metadata step to search and retrieve any existing metadata in the Pentaho Data Catalog that is associated with specific Data Catalog registered data resources.
Specifically, you could create a transformation that searches Data Catalog for existing metadata that points to data stored in CVS files and Parquet files stored in HDFS or Amazon S3. You can then pass all the associated metadata, including the location of the data, to other steps within your transformation for processing.
For example, you could use the Read Metadata to retrieve the metadata for a data file's cluster location and then pass the metadata to a Text File input step or a Catalog Input step that retrieves the file’s contents for an ETL operation on the data. The transformation can then write the new data contents back to the file or to a new file.
The Read Metadata step includes search options to identify, locate, and retrieve the metadata associated with the available data resources listed in Data Catalog .
For more information about accessing Pentaho Data Catalog in PDI, see PDI and Data Catalog.