Processing data

Use Pentaho Data Catalog

Part Number

Processing data involves essential steps to extract meaningful insights and ensure the effective utilization of data. Two significant stages in this process are Metadata Ingest and Data Profiling, especially when dealing with structured and unstructured data. These steps are essential to ensure that the information is used effectively. Additionally, data processing involves Data Identification for structured data.

Metadata Ingest

Ingests the metadata for a file system object store, and JDBC data sources.

Data Profiling

Data Profiling is a crucial step for any data analysis. It is the process in which Data Catalog examines file and JDBC data sources and gathers statistics about the data. It profiles data in the cluster, and uses its algorithms to compute detailed properties, including field-level data quality metrics, and data statistics.

Data Identification

Data Identification is an essential process in managing structured data. It involves tagging data to make it easier to search, retrieve, and analyze. By associating dictionaries and data patterns with tables and columns, you can ensure that data is appropriately categorized and easily accessed when needed.
You must run Data Profiling prior to proceeding with any Data Identification activities.