Create a VFS connection

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-15
Perform the following steps to create a VFS connection in PDI:
  1. Start the PDI client (Spoon) and create a new transformation or job.
  2. In the View tab of the Explorer pane, right-click on the VFS Connections folder, and then click New.
    The New VFS connection dialog box opens.
    VFS connection dialog box
  3. In the Connection name field, enter a name that uniquely describes this connection.
    The name can contain spaces, but it cannot include special characters, such as #, $, and %.
  4. In the Connection type field, select from one of the following types:
    Google Cloud Storage:
    The Google Cloud Storage file system. See Google Cloud Storage for more information on this protocol.
    Snowflake Staging:
    A staging area used by Snowflake to load files. See Snowflake staging area for more information on this protocol.
    Amazon S3 / MinIO:
    • Simple Storage Service (S3) accesses the resources on Amazon Web Services. See Working with AWS Credentials for Amazon S3 setup instructions.
    • MinIO accesses data objects on an Amazon compatible storage server. See the MinIO Quickstart Guide for MinIO setup instructions.
    HCP:
    The Hitachi Content Platform. You must configure HCP and PDI before accessing the platform. You must also configure object versioning in HCP Namespaces. See Access to HCP for more information.
    Catalog:
    The Pentaho Data Catalog. You must configure your Data Catalog connection before accessing the platform. Enter the authentication type, connection URL and account credentials. To access data resources from Data Catalog, an S3 or HDFS connection is also required. See Access to Pentaho Data Catalog for details.
    Azure Data Lake/Blob Storage Gen2
    The Microsoft Azure Storage services. You must create an Azure account and configure Azure Data Lake Storage Gen2 and Blob Storage. See Access to Microsoft Azure for more information.
  5. (Optional) Enter a description for your connection in the Description field.
  6. Click Next.
  7. On the Connection Details page, enter the information according to your selected Connection type.

    If you selected Amazon S3 / MinIO on the previous page, choose one of the following options:

    • For Amazon: Select the Default S3 connection check box to enable use of Amazon S3.
    • For MinIO: Select the Default S3 connection check box to enable use of MinIO. Also, select the PathStyle Access check box to enable path-style access. Otherwise, S3 bucket-style access is used.

    If you selected Azure Data Lake/Blob Storage Gen2 on the previous page, choose one of the following options:

    • For Account Shared Key: Enter your shared account key credential in the Account Shared Key field.
    • For Azure Active Directory: Enter your Azure Active Directory credentials in the Application (client) ID, Client Secret, and Directory (tenant) ID fields.
    • For Shared Access Signature: Enter your SAS token in the Shared Access Signature field.
    Note: For all three Azure options, you must also specify your Azure account name in the Service Account Name field, the Block Size, the Buffer Count, the Max Block Upload Size, and your Access Tier.
  8. (Optional) Click Test to verify your connection.
  9. Click Next to view the connection summary, then Finish to complete the setup.
You can now use your connection to specify VFS information in your transformation steps or job entries, such as the Snowflake entries or HCP steps. See PDI and Snowflake and PDI and Hitachi Content Platform (HCP) for more information about these entries and steps.