Google BigQuery
The Google BigQuery destination streams data into Google BigQuery. You can use other destinations to write to Google Bigtable, Google Cloud Storage, and Google Pub/Sub.
When you configure the destination, you define the existing BigQuery dataset and table to stream data into. The destination streams each record into a row in a BigQuery table. You can optionally define an expression to specify the insert ID to insert or update. The insert ID is a unique ID for each row. If you do not specify an insert ID, the destination inserts each record into a new row.
The destination maps fields from records to BigQuery columns based on matching names and compatible data types. You can configure the destination to ignore invalid columns when the destination cannot map a field to an existing BigQuery column. You can also configure the table cache size for the destination.
You also define the project and credentials provider to use to connect to Google BigQuery. The destination can retrieve credentials from the Google Application Default Credentials or from a Google Cloud service account credentials file.
For more information about streaming data into Google BigQuery, see the Google BigQuery documentation.
BigQuery Data Types
The Google BigQuery destination maps fields from records to BigQuery columns in existing tables based on matching names and compatible data types. If needed, the destination converts Data Collector data types to BigQuery data types.
The following table lists the Data Collector data types and the BigQuery data types that the destination converts them to:
Data Collector Data Type | BigQuery Data Type |
---|---|
Boolean | Boolean |
Byte Array | Bytes |
Date | Date |
Datetime | Datetime or Timestamp |
Double | Float |
Float | Float |
Integer | Integer |
List | Array |
List-Map | Record with repeated fields |
Long | Integer |
Map | Record |
Short | Integer |
String | String |
Time | Time |
- Byte
- Character
- Decimal
Credentials
When the Google BigQuery destination streams data into Google BigQuery, it must pass credentials to BigQuery. Configure the destination to retrieve the credentials from the Google Application Default Credentials or from a Google Cloud service account credentials file.
Default Credentials Provider
When configured to use the Google Application Default Credentials, the destination checks
for the credentials file defined in the GOOGLE_APPLICATION_CREDENTIALS
environment variable. If the environment variable doesn't exist and Data Collector is
running on a virtual machine (VM) in Google Cloud Platform (GCP), the destination uses
the built-in service account associated with the virtual machine instance.
For more information about the default credentials, see Google Application Default Credentials in the Google Developer documentation.
Complete the following steps to define the credentials file in the environment variable:
- Use the Google Cloud Platform Console or the
gcloud
command-line tool to create a Google service account and have your application use it for API access.For example, to use the command line tool, run the following commands:gcloud iam service-accounts create my-account gcloud iam service-accounts keys create key.json --iam-account=my-account@my-project.iam.gserviceaccount.com
- Store the generated credentials file on the Data Collector machine.
- Add the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to the appropriate file and point it to the credentials file.Modify environment variables using the method required by your installation type.
Set the environment variable as follows:
export GOOGLE_APPLICATION_CREDENTIALS="/var/lib/sdc-resources/keyfile.json"
- Restart Data Collector to enable the changes.
- On the Credentials tab for the stage, select Default Credentials Provider for the credentials provider.
Service Account Credentials File (JSON)
When configured to use the Google Cloud service account credentials file, the destination checks for the file defined in the destination properties.
Complete the following steps to use the service account credentials file:
- Generate a service account credentials file in JSON
format.
Use the Google Cloud Platform Console or the
gcloud
command-line tool to generate and download the credentials file. For more information, see generating a service account credential in the Google Cloud Platform documentation. - Store the generated credentials file on the Data Collector machine.
As a best practice, store the file in the Data Collector resources directory,
$SDC_RESOURCES
. - On the Credentials tab for the stage, select Service Account Credentials File for the credentials provider and enter the path to the credentials file.
Configuring a Google BigQuery Destination
Configure a Google BigQuery destination to stream data into Google BigQuery.