Summary tab

Use Pentaho Data Catalog

Part Number

In Data Catalog, you can view metadata in graphical formats like value histograms and unique value counts to help you analyze data quickly. You can also view sample values, and profiled samples.

To open a data type profile, navigate to the column in the resource you want to view and click it to explore the field-level data.

When viewing column details, you can see the resource field-level metadata along with data analysis, cardinality for fields, and sample values. To show metadata in the resource field, you need native access to the resource or metadata level as governed by the RBAC settings for your user role.

Depending on the selected resource level or data element, you can view different summaries of information, including the following resource metrics:

Displays a description of the resource that is imported from the source. You can contribute resource information to the knowledge base to write content and include links to other articles in Data Catalog. To edit the description, click Edit Description, which will open a dialog box where you can format the text using tools like bold, italic, underline, and strikeout. You can also align text, insert code blocks, and add links as needed.
System Information
When you choose an unstructured file, it displays the timestamps for file creation, modification, and last access.
  • In certain file systems, when a file's modification date is less than its creation date, certain APIs, like the SMB network client, might display the more recent date as the modification date.
  • In NFS and CIFS data sources, when you modify a file, Data Catalog might display the same timestamp for both Date Created and Date Last Modified fields.
When you select a table, you can view the Field Count and Row Count statistics. The following table identifies the key details available in the Statistics pane when you select a column in a table to view:
Feature Description
Null Count Number of entries that are null.
Cardinality The number of unique values in a field, where a low cardinality number indicates many repeated values.
HLL An estimate of cardinality of the data, with a roughly ~2% margin of error.
Blank Count The number of entries that are blank.
Min Width The minimum number of character count in a value in the column.
Max Width The maximum number of character count in a value in the column.
Avg Width The average number of character count in a value in the column.
Data Patterns
In Data Catalog, data pattern analysis offers insightful recommendations based on detected patterns and their frequency. These recommendations include RegEx expressions, catering to different levels of pattern matching precision: loose, moderate, and strict. Data Cataloggives you the flexibility to choose the most appropriate patterns. Simplifying the patterns by focusing on just the characters 'A,' 'a,' 'n,' and 's' reveals the underlying data patterns more clearly. After obtaining a set of simplified patterns along with their respective frequency counts, candidate RegEx expressions can be generated. The following options demonstrate possible RegEx expressions tailored to the desired level of strictness:
Pattern Description
^\w{2}\d{5}$ Loose Pattern: This pattern is less strict and excludes the last value in the example with 80% confidence.
^[K]\w\d{5}$ Strict first letter and five digits: This expression maintains strict criteria for the first letter while allowing for variability in the subsequent characters.
^[K]\w\d{5,6}$ Loose on the second character: This pattern ensures 100% confidence but introduces flexibility for the second character.
^[K][A,L,T,W]\d{5,6}$ More Strict Pattern: This expression imposes stricter conditions while maintaining 100% confidence.
^[A-Z][A-Z]\d{5,6}$ Another 100% confidence pattern that differs in its structure.
If your user role does not grant access to the field or viewing level of the information, the Data Patterns pane does not appear.
Sample Data
Shows the random values for the field along with the frequency and distribution when viewing a column. Text names and values are truncated after 200 characters. You can identify resources that have been sample-profiled and other resource-level information.
To view this pane, your role must allow Sample Data Access through native system permissions. If your user role has administrative privileges, you can configure these values. If not, contact your administrator for details.
Important: Data Catalog governs access to view sample data with the View samples permission. Users with this permission can see sample data, but users without it see the sample data in a masked format, such as ****** ** **, ensuring sensitive information remains protected.
Properties panel
Displays a summary of the resource properties, like the last update time stamp, name, version, and type of the resource.
Business Terms panel
Lists associated business terms for the resource. You can also click Add Term to open the Business Terms dialog box and add terms to the resource. For more information, see the Administer Pentaho Data Catalog document.
Tags panel
Lists the tags associated with the resource. In addition, you can click and start adding tags like “quality:45” (the key should be unique) to the resource, which helps to identify the resource with tagged keywords.
Custom Properties panel
Lists the first five custom properties associated with the resource. Custom properties refer to user-defined metadata attributes or fields that can be associated with various data assets, such as databases, tables, files, or documents, to provide additional context and information about those assets. To add a custom property, click Add Custom Property and provide the required information. In addition, go to the Properties tab to see the complete list of custom properties added to the resource.