Google Pub/Sub Publisher
Supported pipeline types:
|
When you configure the destination, you define the Google Pub/Sub topic ID to write messages to. You also define the project and credentials provider to use to connect to Google Pub/Sub. The destination can retrieve credentials from the Google Application Default Credentials or from a Google Cloud service account credentials file.
By default, the Google Pub/Sub Publisher destination writes messages in batches. Using advanced properties, you can configure the conditions that trigger writing a new batch or disable batch processing to write messages individually. You can also configure the action the destination takes when it reads messages faster than it can write messages.
A Google Pub/Sub message contains a payload and optional user-defined attributes that describe the payload content. When records include record header attributes, the Google Pub/Sub Publisher destination includes the record header attributes in message attributes. The destination does not include internal record header attributes in message attributes.
For more information about record header attributes, see Record Header Attributes.
Credentials
When the Google Pub/Sub Publisher destination publishes messages to a Google Pub/Sub topic, it must pass credentials to Google Pub/Sub. Configure the destination to retrieve the credentials from the Google Application Default Credentials or from a Google Cloud service account credentials file.
Default Credentials Provider
When configured to use the Google Application Default Credentials, the destination
checks for the credentials file defined in the
GOOGLE_APPLICATION_CREDENTIALS environment variable. If the
environment variable doesn't exist and Data Collector is
running on a virtual machine (VM) in Google Cloud Platform (GCP), the destination uses
the built-in service account associated with the virtual machine instance.
For more information about the default credentials, see Google Application Default Credentials in the Google Developer documentation.
Complete the following steps to define the credentials file in the environment variable:
- Use the Google Cloud Platform Console or the
gcloudcommand-line tool to create a Google service account and have your application use it for API access.For example, to use the command line tool, run the following commands:gcloud iam service-accounts create my-account gcloud iam service-accounts keys create key.json --iam-account=my-account@my-project.iam.gserviceaccount.com - Store the generated credentials file on the Data Collector machine.
- Add the
GOOGLE_APPLICATION_CREDENTIALSenvironment variable to the appropriate file and point it to the credentials file.using the method required by your installation type.
Set the environment variable as follows:
export GOOGLE_APPLICATION_CREDENTIALS="/var/lib/sdc-resources/keyfile.json" - Restart Data Collector to enable the changes.
- On the Credentials tab for the stage, select Default Credentials Provider for the credentials provider.
Service Account Credentials File (JSON)
When configured to use the Google Cloud service account credentials file, the destination checks for the file defined in the destination properties.
Complete the following steps to use the service account credentials file:
- Generate a service account credentials file in JSON
format.
Use the Google Cloud Platform Console or the
gcloudcommand-line tool to generate and download the credentials file. For more information, see generating a service account credential in the Google Cloud Platform documentation. - Store the generated credentials file on the Data Collector machine.
As a best practice, store the file in the Data Collector resources directory,
$SDC_RESOURCES. - On the Credentials tab for the stage, select Service Account Credentials File for the credentials provider and enter the path to the credentials file.
Data Formats
Google Pub/Sub Publisher writes data to Google Pub/Sub based on the data format that you select. You can use the following data formats:
- Avro
- The stage writes records based on the Avro schema. You can use one of the following methods to specify the location of the Avro schema definition:
- Binary
- The stage writes binary data to a single field in the record.
- Delimited
- The destination writes records as delimited data. When you use this data format, the root field must be list or list-map.
- JSON
- The destination writes records as JSON data. You can use one of
the following formats:
- Array - Each file includes a single array. In the array, each element is a JSON representation of each record.
- Multiple objects - Each file includes multiple JSON objects. Each object is a JSON representation of a record.
- Protobuf
- Writes one record in a message. Uses the user-defined message type and the definition of the message type in the descriptor file to generate the message.
- SDC Record
- The destination writes records in the SDC Record data format.
- Text
- The destination writes data from a single text field to the destination system. When you configure the stage, you select the field to use.
- XML
- The destination creates a valid XML document for each record. The
destination requires the record to have a single root field that
contains the rest of the record data. For details and
suggestions for how to accomplish this, see Record Structure Requirement.
The destination can include indentation to produce human-readable documents. It can also validate that the generated XML conforms to the specified schema definition. Records with invalid schemas are handled based on the error handling configured for the destination.
Configuring a Google Pub/Sub Publisher Destination
Configure a Google Pub/Sub Publisher destination to write messages to a Google Pub/Sub topic.