Planning your Data Catalog

Get started with Pentaho Data Catalog

Part Number

It is helpful to plan your data catalog before building it. Use the following guidelines to plan your Data Catalog:

Plan data sources to add
When setting up data analytics with Data Catalog, start by adding the data sources that you want to analyze.
Before adding your data sources, gather the configuration information you need to set up the data sources. Your database administrator is best positioned to help provide the configuration information needed, such as the following information:
  • Data source type
  • Configuration method, for example: credentials, SSL, or a URI (Uniform Resource Identifier)
    • For credentials, username and password, host name, and port number
    • For SSL, encryption information, for example:
      • Encryption type, such as: Encryption only, Encryption with Server and Client Authentication
      • Trust store type and location
      • Trust store password and cipher suite
      • Key store type, location, and password
    • For URI, known as a connection string, you need a username and password
  • Any driver needed
  • For Amazon Web Services (AWS) data source types, a configuration method isn't specified. You must have information such as AWS region, account number, IAM username, access key ID, and secret access key to configure these data source types.
Tip: Data Catalog uses the data source name you enter when setting up the data source throughout the data catalog. As a best practice, adopt a naming convention that is logical for users to understand that you can use for all data sources and types that you add to Data Catalog.
Plan business glossaries and business terms
The business glossary is an organized list of business terms and their definitions intended to serve as the single and definitive reference for an organization. You can associate business terms with data elements, business rules, related terms, and custom attributes to form a comprehensive view of the organization’s business concepts and data landscape.
You can organize business glossary terms in a domain and category hierarchy, or under just a domain, or as a stand-alone term. If you do not specify a domain or category, the term appears as Unassigned.
Plan users and permissions
It is a best practice to plan the access your users need before adding them to the system, to make sure that access to data sources and business data is restricted to only the users who need it. You can add administrators for departments or lines of business and then they can add users with the permissions needed for their specific work responsibilities. You can use communities, which are custom roles, to fine tune access to specific data source types and other Data Catalog assets.

You are now ready to start building your Data Catalog.