This table shows the Big Data sources that are compatible with specific Pentaho tools.
Data Source | Versions | Analyzer | PIR/PDD | Pentaho Reporting | DSW | PDIServer/Client | PRD | PSW | PME |
---|---|---|---|---|---|---|---|---|---|
Amazon EMR | 5.21, 5.24, 5.32f | Yes | Yes | No | No | Yes | Yes | No | Yes |
Cloudera | 6.1, 6.2, 6.3a (for job execution) | No | No | No | No | Yes | Yes | No | Yes |
via Impalab (as data source) | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | |
via Hive2c(as data source) | No | Yes | Yes | Yes | Yes | Yes | No | Yes | |
Cloudera Data Platform | 7.1.x (for job execution) | No | No | No | No | Yes | Yes | No | Yes |
via Impalah (as data source) | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | |
via Hive3i (as data source) | No | Yes | Yes | Yes | Yes | Yes | No | Yes | |
Datastax | 4.6, 4.8 | No | No | No | No | Yes | No | No | No |
Google BigQuery | 1.2.2.1004e | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Google Dataprocg (for job execution) | 1.4, 2.2j | No | No | No | No | Yes | Yes | No | No |
via Hive2 and Google BigQuery (as data source) | Yes | Yes | Yes | Yes | Yes | Yes | No | Yes | |
Greenplum | 4.2, 4.3 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Hortonworks |
3.0, 3.1 (for job execution) |
No | No | No | No | Yes | Yes | No | Yes |
via Hive2c(as data source) | No | Yes | Yes | Yes | Yes | Yes | No | Yes | |
via Spark SQLd (as data source) |
No | No | No | No | Yes | No | No | No | |
Microsoft Azure HDInsight | 4.0 | Yes | Yes | No | No | Yes | No | No | Yes |
MongoDB | 4.0.2 | No | No | Yes | No | Yes | Yes | No | No |
Netezza | 7.1, 7.2 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
SAP HANA | SPS | No | No | No | No | Yes | No | No | No |
Teradata | 14.10, 15.0 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Vertica | 9.3.0.0 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
Notes: A generic Apache Hadoop driver is included in the Pentaho distribution for version 9.3: Other supported drivers can be downloaded from the Hitachi Vantara Lumada and Pentaho Support Portal. a You must have the current version of the Pentaho release to use the CDH 6.1 driver. The CDH 6.1 driver requires the Impala JDBC Connector 2.6.4 Cloudera driver. CDH 6.1 requires Pentaho Service Pack 8.2.0.4 or later. The CDH 6.1 driver works with CDH 6.2 and CDH 6.3. b As with any data source, the performance of Pentaho Analyzer on Impala will be dependent upon the data shape, Impala’s configuration, and the types of queries. See the Customer Portal best practice article concerning Pentaho Analyzer on Impala for more information. c Hive2 as a data source for CDH also supports Hive on Spark. Hive2 as a data source for HDP also supports Hive on Tez. dThe Simba Spark SQL driver needs to be downloaded, installed, and configured to be used as a data source for Hortonworks. See our Install Pentaho Data Integration and Analytics document for more information. e The Simba driver required for Google BigQuery is the JDBC 4.2-compatible version. See https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.2.2.1004.zip. f Use the EMR 5.21 driver for your EMR 5.24 or EMR 5.32 cluster. The EMR 5.21 driver is certified to work for EMR 5.24 and EMR 5.32. g HBase is not supported with Google Dataproc. h You must have the current version of the Pentaho release to use the CDP 7.1.4 driver. The CDP 7.1.4 driver requires the Impala JDBC Connector 2.6.4 Cloudera driver. i Hive3 as a data source for CDP also supports Hive LLAP, and Hive3 on Tez. j Use the Google Dataproc 1.8 driver for your Google Dataproc 2.2 cluster. The Google Dataproc 1.8 driver is certified to work for Google Dataproc 2.2. |