This table shows the Big Data sources that are compatible with specific Pentaho tools.
Data Source | Versions | Analyzer | PIR/PDD | Pentaho Reporting | DSW | PDIServer/Client | PRD | PSW | PME |
---|---|---|---|---|---|---|---|---|---|
Amazon EMR | 7.0.0e (Certified) | No | No | No | No | Yes | Yes | No | No |
Apache Vanilla Hadoop | 3.3.0 (Certified) | No | No | No | Yes | Yes | No | No | No |
Cassandra (Datastax) | 6.8 (Certified) | No | No | No | No | Yes | No | No | No |
Cloudera Data Platform (CDP) Private Cloud | 7.1.9 (for job execution) | No | No | No | No | Yes | Yes | No | Yes |
via Impala (as data source) | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | |
via Hive3a (as data source) | No | Yes | Yes | Yes | Yes | Yes | No | Yes | |
Google BigQuery | 1.5.4.1008b | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Google Dataprocc (for job execution) | 2.1d | No | No | No | No | Yes | Yes | No | No |
via Hive2 and Google BigQuery (as data source) | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | |
Greenplum | 4.3 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Microsoft Azure HDInsight | 4.0 | Yes | Yes | No | No | Yes | No | No | Yes |
MongoDB | 7 | No | No | Yes | No | Yes | Yes | No | No |
Vertica | 11 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Notes: A generic Apache Hadoop driver is included in the Pentaho distribution for version 10.2: Other supported drivers can be downloaded from the Support Portal. a Hive3 as a data source for CDP also supports Hive LLAP, and Hive3 on Tez. b The Simba driver required for Google BigQuery is the JDBC 4.2-compatible version, which you can download from https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.2.2.1004.zip. c HBase is not supported with Google Dataproc. d Use the Google Dataproc 2.1 driver for your Google Dataproc 2.2 cluster. The Google Dataproc 2.1 driver is certified to work for Google Dataproc 2.2. e EMR clusters (version 7.x and later) built with JDK 17 exclude the commons-lang-2.6.jar library from their standard Hadoop library directories ($HADOOP_HOME/lib). To use the EMR driver for EMR 7.x, obtain the commons-lang-2.6.jar file from a trusted source, such as the official Maven repository (Maven Repository: commons-lang » commons-lang » 2.6). Then manually copy the downloaded JAR file to the $HADOOP_HOME/lib or $HADOOP_MAPRED_HOME/lib directory on each node within the EMR cluster to ensure that all worker nodes have access to the library. |