You can create a new cluster by importing
the site.xml files from an existing cluster. Perform the following
steps to create a cluster by import.
- In the PDI client, create a new job or transformation or open an existing one.
- Click the View tab and then right-click the Hadoop Clusters folder.
-
Click Import cluster.
The Hadoop Clusters dialog box appears.
-
Enter a user-defined name to assign the cluster connection in the Cluster name field.
Characters allowed in the cluster name field are uppercase and lowercase letters, numbers, and hyphens.Note: Valid cluster names may include uppercase and lowercase letters, numbers, and hyphens. However, the cluster name cannot end with a hyphen. To ensure a valid cluster name, do not use any other symbols, punctuation characters, or blank spaces.After you create the connection, you can locate this named connection in the View tab on the PDI client.
- Use the Driver and Version options to select the distribution of Hadoop on your cluster and its version number. The Hitachi Vantara Lumada and Pentaho Support Portal provides drivers that you can download and install for supported versions of Amazon EMR, Cloudera, Google Dataproc, and Hortonworks.
-
Click Browse to add file(s) and browse to the directory containing the site.xml files that were provided to you by your cluster
administrator.
The required files include:
- hive-site.xml
- mapred-site.xml
- yarn-site.xml
- core-site.xml
- hbase-site.xml
- hdfs-site.xml
- oozie-site.xml (if you are using Oozie in your configuration)
-
Click Open.
The Site XML files section displays the files you selected.
- Enter your user name and password in the HDFS section if you are connecting to a secure cluster.
-
Click Next and specify the security option for your
cluster.
- If your Hadoop cluster is non-secure, select None and then click Next to test your connection.
- If your Hadoop cluster is secure, you need to add security to your cluster connection. See Add security to cluster connections for instructions.