Google BigQuery

Try Pentaho Data Integration and Analytics

Version
9.3.x
Audience
anonymous
Part Number
MK-95PDIA000-11
You can use Google BigQuery as a data source with the Pentaho User Console or with the PDI client.

Before you begin, you must have a Google account and must create service account credentials in the form of a key file in JSON format to connect to Google BigQuery. To create service account credentials, see the Google Cloud Storage Authentication documentation.

Additionally, you must set permissions for your BigQuery and Google Cloud accounts. To configure your service account authentication, see the Google Service Account documentation.

Perform the following steps to create a JDBC connection to a Google BigQuery data source from the User Console or PDI client.

  1. Stop the Pentaho Server.
  2. Download the ZIP file containing the Simba version 1.2.2.1004 JDBC 4.2 driver for Google BigQuery from https://storage.googleapis.com/simba-bq-release/jdbc/SimbaJDBCDriverforGoogleBigQuery42_1.2.2.1004.zip.
  3. Navigate to the server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/lib directory for the User Console or the design-tools/data-integration/lib directory for the PDI client and delete any files associated with previous versions of Google BigQuery.
    Visually verify each file to ensure the older version is deleted.
  4. Extract the following files to the server/pentaho-server/tomcat/webapps/pentaho/WEB-INF/lib folder for the User Console or the design-tools/data-integration/lib directory for the PDI client.
    • animal-sniffer-annotations-1.14.jar
    • api-common-1.7.0.jar
    • avro-1.9.0.jar
    • checker-compat-qual-2.5.2.jar
    • error_prone_annotations-2.1.3.jar
    • gax-1.42.0.jar
    • gax-grpc-1.42.0.jar
    • google-api-client-1.28.0.jar
    • google-api-services-bigquery-v2-rev426-1.25.0.jar
    • google-auth-library-credentials-0.15.0.jar
    • google-auth-library-oauth2-http-0.13.0.jar
    • GoogleBigQueryJDBC42.jar
    • google-cloud-bigquerystorage-0.85.0-alpha.jar
    • google-cloud-core-1.67.0.jar
    • google-cloud-core-grpc-1.67.0.jar
    • google-http-client-1.29.0.jar
    • google-http-client-apache-2.0.0.jar
    • google-http-client-jackson2-1.28.0.jar
    • google-oauth-client-1.28.0.jar
    • grpc-alts-1.18.0.jar
    • grpc-auth-1.18.0.jar
    • grpc-context-1.18.0.jar
    • grpc-core-1.18.0.jar
    • grpc-google-cloud-bigquerystorage-v1beta1-0.50.0.jar
    • grpc-grpclb-1.18.0.jar
    • grpc-netty-shaded-1.18.0.jar
    • grpc-protobuf-1.18.0.jar
    • grpc-protobuf-lite-1.18.0.jar
    • grpc-stub-1.18.0.jar
    • gson-2.7.jar
    • j2objc-annotations-1.1.jar
    • javax.annotation-api-1.3.2.jar
    • jsr305-3.0.2.jar
    • opencensus-api-0.18.0.jar
    • opencensus-contrib-grpc-metrics-0.18.0.jar
    • opencensus-contrib-http-util-0.18.0.jar
    • protobuf-java-3.7.0.jar
    • protobuf-java-util-3.7.0.jar
    • proto-google-cloud-bigquerystorage-v1beta1-0.50.0.jar
    • proto-google-common-protos-1.15.0.jar
    • proto-google-iam-v1-0.12.0.jar
    • threetenbp-1.3.3.jar
    Note: The Google BigQuery connection name does not display in the User Console Database Connection dialog box until you copy these files.
  5. Restart the Pentaho Server.
  6. Log on to the User Console or the PDI client, then open the Database Connection dialog box.
    See the Install Pentaho Data Integration and Analytics document for more information on the Database Connection dialog box.
  7. In the Database Connection dialog box, select General, then select Google BigQuery as the Database Type.
  8. In the Settings area, enter the information for your Google BigQuery account.
    • The Host Name is the URL to Google's BigQuery web services API. For example, https://www.googleapis.com/bigquery/v2
    • The Project ID in the PDI client and the Database name in the User Console are identical.
    • The Port Number is 443.
  9. Click Options, then add the following parameters and values.
    Parameter Value
    OAuthType 0 (Zero)
    OAuthServiceAcctEmail Specify your service account email address.
    OAuthPvtKeyPath Specify the path to your private key credential file.
    Timeout Specify the amount of time, in seconds, before the server closes the connection. The recommended value is 120 seconds.
  10. Click Test to verify that you can connect to your data.