Processing structured data

Use Pentaho Data Catalog

Version
10.1.x
Audience
anonymous
Part Number
MK-95PDC000-02

Perform the following steps to process the structured data:

You must perform Metadata Ingest, Data Profiling, and Data Identification to process structured data.
  1. Select the structured resource you want to investigate in Data Canvas.
    This can be a table or column.
  2. Click Process.
    The Choose Process page opens with Metadata Ingest, Data Profiling, Data Identification options. In addition, for Microsoft SQL and Oracle databases, you see an additional option, Usage Statistics.
    Choose process
  3. In the Metadata Ingest card, click Start to begin the metadata ingest process.
    You can view the status of metadata ingest on the Manage Workers page.
  4. To perform the data profiling, click the Data Profiling card.
    The Profiling page opens with an option to configure data profiling. You can use Skip Recent (days) to skip profiling for recently profiled tables. For example, if the days field is set to 7, any table profiled within the last 7 days will be skipped.
    Note: When configuring data profiling, it is recommended to use the default settings as they are suitable for most situations.
    Field Description
    Extract samples Extracts the sample data during profiling and displays it in the summary tab.
    Skip Recent (days) Skips profiling for recently profiled tables. For example, if the days field is set to 7, any table profiled within the last 7 days will be skipped.
  5. To perform data identification, click the Data Identification card.
    Important: You must perform data profiling before proceeding with data identification. If data profiling was not done previously, Data Catalog highlights it as Required. You can start data profiling from the Data Identification card by clicking Start.

    Profiling
  6. Click Select Methods and select the Dictionaries and Patterns, click Apply, and then click Start.
    You can view the status of metadata ingest on the Manage Workers page.
  7. (Optional) If you're working with Microsoft SQL or Oracle databases and want to collect usage statistics, click the Usage Statistics card, choose a date range, and then click Start.
    You can view the status of the Entity Usage process on the Manage Workers page. After completion, the gathered information will be accessible in the Usage Statistics View collection within Business Intelligence Database.
  8. Go to Data Canvas to view tags.