Data source connectivity

Get started with Pentaho Data Catalog

Version
10.1.x
Audience
anonymous
Part Number
MK-95PDC001-02
The following table contains the supported data sources and respective requirements to connect with Data Catalog.
Data source Requirements
AWS S3
  • AWS region where the S3 bucket was created
  • Access key and secret access key
  • Read-only permissions to the S3 bucket
Azure Blob Storage
  • Account Fully Qualified Domain Name (FQDN)
  • Client ID and client key
  • authTokenEndpoint
HCP
  • AWS region where the S3 bucket was created
  • Access key and secret access key
  • Read-only permissions to the S3 bucket
HDFS
  • Hadoop version 2.7.2 and later
  • URI should provide a hostname and share folder details
  • Path of the directory that needs to be scanned
  • Read-only access to the directory
To install Data Storage Optimizer for Hadoop, see Installing Data Storage Optimizer for Hadoop.
OneDrive and SharePoint
  • Application (client) ID, Directory (tenant ID), and clientSecret from a registered app on the Azure portal
  • Delegated permissions and Application permissions in the registered app
  • Read-only permissions to the OneDrive and SharePoint sites
RDBMS Read-only access to all database objects and system catalog tables to perform data profiling
SMB/CIFS
  • URI should provide a hostname and share folder details
  • Username and password to access the SMB/CIFS Share Directory
  • Path of directory that needs to be scanned
  • Read-only access is required
Note: For optimizing data sources using Pentaho Data Storage Optimizer, read, write, and execute permissions are required in addition to the listed requirements.