Processing data involves essential steps to extract meaningful insights and ensure the effective utilization of data. Two significant stages in this process are Metadata Ingest and Data Profiling, especially when dealing with structured and unstructured data. These steps are essential to ensure that the information is used effectively. Additionally, data processing involves Data Identification for structured data.
Metadata Ingest
Ingests the metadata for a file system object store, and JDBC data sources.
Data Profiling
Data Profiling is a crucial step for any data analysis. It is the process in which Data Catalog examines file and JDBC data sources and gathers statistics about the data. It profiles data in the cluster, and uses its algorithms to compute detailed properties, including field-level data quality metrics, and data statistics.