Add a cluster connection by import

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15
You can create a new cluster by importing the site.xml files from an existing cluster. Perform the following steps to create a cluster by import.
  1. In the PDI client, create a new job or transformation or open an existing one.
  2. Click the View tab and then right-click the Hadoop Clusters folder.
  3. Click Import cluster.
    The Hadoop Clusters dialog box appears.
    Hadoop Clusters Import dialog
  4. Enter a user-defined name to assign the cluster connection in the Cluster name field.
    Characters allowed in the cluster name field are uppercase and lowercase letters, numbers, and hyphens.
    Note: Valid cluster names may include uppercase and lowercase letters, numbers, and hyphens. However, the cluster name cannot end with a hyphen. To ensure a valid cluster name, do not use any other symbols, punctuation characters, or blank spaces.
    After you create the connection, you can locate this named connection in the View tab on the PDI client.
  5. Use the Driver and Version options to select the distribution of Hadoop on your cluster and its version number. The Hitachi Vantara Lumada and Pentaho Support Portal provides drivers that you can download and install for supported versions of Amazon EMR, Cloudera, Google Dataproc, and Hortonworks.
  6. Click Browse to add file(s) and browse to the directory containing the site.xml files that were provided to you by your cluster administrator.
    The required files include:
    • hive-site.xml
    • mapred-site.xml
    • yarn-site.xml
    • core-site.xml
    • hbase-site.xml
    • hdfs-site.xml
    • oozie-site.xml (if you are using Oozie in your configuration)
  7. Click Open.
    The Site XML files section displays the files you selected.
  8. Enter your user name and password in the HDFS section if you are connecting to a secure cluster.
  9. Click Next and specify the security option for your cluster.