Manage data identification methods

Administer Pentaho Data Catalog

Version
10.0.x
Audience
anonymous
Part Number
MK-95PDC002-00

Pentaho Data Catalog provides a built-in list of data identification methods called dictionaries and data patterns. Dictionaries are lists of words used to create bitsets, HyperLogLogs (HLLs), and data patterns that can be used for column data matching relying on bitset matching. Patterns define the data pattern, regular expression, column alias, and tags used to identify a data column. In addition to this, you can import custom dictionaries and data patterns configuration files that better suit your organization's specific needs.