Installing Data Catalog offline

Get started with Pentaho Data Catalog

Version
10.1.x
Audience
anonymous
Part Number
MK-95PDC001-02
Important: It is a best practice before installing Data Catalog to save a copy of your conf/.env file to save any environment customizations you have made in case the file is overwritten during the installation process. During installation, Data Catalog checks for a PDC_DATA_ENCRYPTION_KEY environment variable in the conf/.env file. If the variable exists, the conf/.env file is retained. However, if the variable does not exist, Data Catalog generates a new .env file containing a PDC_DATA_ENCRYPTION_KEY environment variable. If needed, you can add any custom environment variable settings back in to the new .env file from your saved file.

Your organization may use systems that, for security reasons, are not connected to the internet. Use this procedure to install Data Catalog on a system that is not connected to the internet.

Perform the following steps to install Data Catalog offline:

  1. Verify that you have root privileges or have the necessary permissions to run Docker.
  2. Open a terminal window on your dedicated Data Catalog deployment server.
  3. Download the offline installation package from the JFrog Repository at the following link: https://one.hitachivantara.com/artifactory/pdc-generic-release/pentaho/pdc-docker-deployment/release-v10.1/pdc-pdso-10.1.0-offline.tgz
  4. Unzip the package using the following commands:
    tar -xvf [name of offline installation package].tgz
    cd pentaho/pdc-docker-deployment

    In this case, the name of the offline installation package is pdc-pdso-10.1.0-offline. The Docker images required for installation are packaged in the vendor directory.

  5. Use the following command to load the required installation images into Docker:
    docker load -i vendor/pdc-images.tar
  6. Start the Data Catalog application using the following command:
    sh pentaho/pdc-docker-deployment/ pdc.sh up
    You may get a message to set the GLOBAL_SERVER_HOST_NAME variable:
    GLOBAL_SERVER_HOST_NAME env is not set, please select an environment variable value from the list or type your own:
    1.	IP address
    2.	Hostname
    3.	Hostname.localhost.localdomain
    4.	Other 
    #?    1
  7. (Optional) If you get the GLOBAL_SERVER_HOST_NAME env is not set message, enter the number for the option that you want to set as the variable and press Enter.
    If you select 1, the script sets the GLOBAL_SERVER_HOST_NAME variable to the IP address in the conf/.env file.
  8. Start all the Docker containers using the following command:
    sh pdc.sh up
    The installation script uses the packaged Docker images for the Data Catalog release and the Data Storage Optimizer release, if installed, to create and run Docker containers on your dedicated server.
The installation is ready for use after all the Docker containers have successfully started.

Access Data Catalog through your browser (the Chrome browser is recommended) using the hostname name or IP address, as follows:

[hostname or IP address]/pdc

Note: For new installations, you are redirected to the PDC login page.
Data Catalog provides a set of default users for demonstrating and testing. These default users have the following specific roles assigned:
Role Actions
Admin A user who can configure the product
Data User A user who is interested in leveraging Data Catalog to find data for use for a business operation
Data Steward A user who will update and process data in Data Catalog for use for a business operation, including migrating data for Pentaho Data Storage Optimizer
Business User A user who needs to view business-specific glossaries and dictionaries
Business Steward A user who will maintain business-specific glossaries and dictionaries
Data Developer A user who will create and update business rules in Data Catalog or metadata rules in Data Storage Optimizer
For more information, see Manage users and permissions in the Administer Pentaho Data Catalog document.
Refer to the installation package for credential details for the default users. This information is found in an encrypted file.
Important: For Development and Production environments, it is a best practice to create users upon installation and deprecate these default users.