Connecting to Virtual File Systems

Pentaho Data Integration

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA003-09

You can connect to most Virtual File Systems (VFS) through VFS connections in PDI. A VFS connection is a stored set of VFS properties that you can use to connect to a specific file system. In PDI, you can add a VFS connection and then reference that connection whenever you want to access files or folders on your Virtual File System. For example, you can use the VFS connection for Hitachi Content Platform (HCP) in any of the HCP transformation steps without the need to repeatedly enter your credentials for data access.

With a VFS connection, you can set your VFS properties with a single instance that can be used multiple times. The VFS connection supports the following file systems:

Amazon S3 / MinIO/HCP
  • Simple Storage Service (S3) accesses the resources on Amazon Web Services. See Working with AWS Credentials for Amazon S3 setup instructions.
  • MinIO accesses data objects on an Amazon compatible storage server. See the MinIO Quickstart Guide for MinIO setup instructions.
  • HCP uses the S3 protocol to access HCP. See Access to HCP REST for more information.
Azure Data Lake Gen 1
Azure Data Lake Gen 1 accesses data objects on Microsoft Azure Gen 1storage services. You must create an Azure account and configure Azure Data Lake Storage Gen 1. See Access to Microsoft Azure for more information.
Azure Data Lake Gen 2/Blob
Azure Data Lake Gen 1 accesses data objects on Microsoft Azure Gen 2 or blob storage services. . You must create an Azure account and configure Azure Data Lake Storage Gen2 and Blob Storage. See Access to Microsoft Azure for more information.
Catalog
Accesses date in the Pentaho Data Catalog. See Access to Data Catalog
Google Cloud Storage
Accesses data in the Google Cloud Storage file system. See Google Cloud Storage for more information on this protocol.
HCP REST
Accesses data in the Hitachi Content Platform. You must configure HCP and PDI before accessing the platform. See Access to HCP REST for more information.
Snowflake Staging
A staging area used by Snowflake to load files. See Snowflake staging area for more information on this protocol.

After you create a VFS connection, you can use it with PDI steps and entries that support the use of VFS connections. If you are connected to a repository, the VFS connection is saved in the repository. If you are not connected to a repository, the connection is saved locally on the machine where it was created.

If a VFS connection in PDI is not available for your Virtual File System, you may be able to access it with the VFS browser. See VFS browser for further details.