Data pattern analysis

Get started with Pentaho Data Catalog

Version
10.1.x
Audience
anonymous
Part Number
MK-95PDC001-02

Fundamental to data quality analysis is either the use of a regular expression to check data, or to statistically analyze the data itself to find patterns and outlier patterns (which could indicate bad data).

The data identification process generates roughly the top 20 most common patterns which capture the characteristics of the data. You can then use these patterns, along with their statistical frequency and supplementary information, to generate regular expression (RegEx) recommendations for your data. You can tune the RegEx to meet your specific needs. Or you can select the valid patterns, so that subsequent data quality checks will identify any data entries that are outside the accepted patterns.

After running data identification, you can use the Galaxy View feature to visualize the data tagging, identify the data flow, locate your data, and view the sensitivity and security of that data.