Snowflake

Supported pipeline types: Data Collector

The Snowflake destination writes data to one or more tables in a Snowflake database. You can use the Snowflake destination to write to any accessible Snowflake database, including those hosted on Amazon S3, Microsoft Azure, and private Snowflake installations.

As recommended by Snowflake, the Snowflake destination stages data to one of the Snowflake-supported cloud platforms: Amazon S3 or Microsoft Azure. Then, the destination sends a command to Snowflake to process the staged data. Writing directly to Snowflake using the JDBC Producer destination is possible, but not recommended for performance reasons.

You can use the Snowflake destination to write new data or change data capture (CDC) data to Snowflake. When processing new data, the destination can load data to Snowflake using the COPY command or Snowpipe. When processing CDC data, the destination uses the MERGE command.

The Snowflake destination writes data from record fields to table columns based on matching names. The destination can compensate for data drift by creating new columns and tables in Snowflake when new fields and table references appear in records.

When you configure the Snowflake destination, you specify the Snowflake region, account and connection information and the number of connections to use to write to Snowflake. You can also define additional Snowflake connection properties as needed.

You configure the Snowflake warehouse, database, schema, and tables to use, and optionally enable properties to handle data drift. You specify load method properties and staging details, and optionally define advanced properties for Amazon S3 or Microsoft Azure.

You can configure the root field for the row, and any first-level fields that you want to exclude from the record. You can also configure the destination to replace missing fields or fields containing invalid data types with the specified default values, and to replace newline characters in string fields with a specified character.

Before you use the Snowflake destination, you must install the Snowflake stage library and complete other prerequisite tasks. The Snowflake stage library is an Enterprise stage library that is free for development purposes only. For information about purchasing the stage library for use in production, contact StreamSets.

When using the Snowflake destination, StreamSets recommends configuring the Snowflake warehouse to auto-resume upon receiving new queries.

Sample Use Cases

Here are a couple common scenarios for using the Snowflake destination:

Replicating a database

Say you want to replicate data being written to five tables in an Oracle database schema. You want to write both existing data and incoming CDC data to Snowflake.

To do this, you create two pipelines, one to load the existing data and one to process incoming data, as follows:

First pipeline for replicating data - The first pipeline uses the multithreaded JDBC Multitable Consumer origin to read from the tables that you want to replicate. To take advantage of Snowflake’s bulk load abilities, you configure the origin to use a very large batch size - somewhere between 20,000 and 50,000 records per batch. You set the number of threads to five to read data from all five tables concurrently, and you increase the connection pool size to five to allow writing to five tables in Snowflake concurrently.
In the pipeline, you use as many processors as needed to process the data. Then, you configure the Snowflake destination to load the data into Snowflake.

If you wanted the data to be immediately available after being written to Snowflake, you would use the default COPY command load method. But since you can tolerate a bit of latency, you use the faster, cheaper Snowpipe to load the data. Using Snowpipe requires performing some prerequisite steps in Snowflake.

After the initial load is complete, you stop the first pipeline and start the second pipeline to process incoming CDC data.
Second pipeline for CDC data - In the second pipeline, you use the Oracle CDC Client origin and the Snowflake destination. You configure this origin to use a very large batch size as well, somewhere between 20,000 and 50,000 records per batch.
In the destination, you select the Use CDC property to perform CRUD operations when writing to Snowflake. This results in the destination using the MERGE command to load data into Snowflake. You specify a field in the records that contains the table name to use when writing to Snowflake, and you define the key columns for each table.

To improve performance, you also increase the connection pool size. For more information, see Performance Optimization.

Offloading from Hadoop

Say you have a Hadoop data lake that you want to move into Snowflake. In this case, you only need one pipeline that includes the multithreaded Hadoop FS Standalone origin, all of the processors that you need, and the Snowflake destination.

In the Snowflake destination, you could use Snowpipe if you can afford a little latency in the availability of the data after its written. But Snowpipe only writes to tables that already exist. To allow the destination to create new tables to compensate for drifting data, you use the default COPY command to load data. As mentioned earlier, you can use large batches, multiple threads, and multiple connections to optimize pipeline performance.

Since the data in the data lake is probably less structured than typical database data, configure the following data drift and advanced data properties in the destination to smooth out the transition:

Enable the Data Drift property to allow creating new columns in tables when a new field appears.
Enable the Table Auto Create property to create new tables as needed.
To avoid generating unnecessary error records:
- Use the Ignore Missing Fields property to replace missing data with default values.
- Use the Ignore Fields with Invalid Types property to replace data of invalid data types with default values.
- Define the default values to use for each Snowflake data type.
- Replace newline characters in string fields with a specified character.

Prerequisites

Before you configure the Snowflake destination, complete the following prerequisites:

Install the Snowflake stage library.
Create a Snowflake external stage for staging data.
To use Snowpipe, complete the Snowpipe prerequisites, as well.

Install the Snowflake Stage Library

You must install the Snowflake stage library before using the Snowflake destination. The Snowflake stage library includes the Snowflake JDBC driver that the destination uses to access Snowflake.

The Snowflake stage library is an Enterprise stage library that is free for development purposes only. For information about purchasing the stage library for use in production, contact StreamSets.

You can install the Enterprise stage library using Package Manager for a tarball Data Collector installation or as a custom stage library for a tarball, RPM, or Cloudera Manager Data Collector installation.

Supported Versions

The following table lists the versions of the Snowflake Enterprise stage library to use with specific Data Collector versions:

Data Collector Version	Supported Stage Library Version
Data Collector 3.8.x	Snowflake Enterprise Library 1.0.1 or 1.0.2
Data Collector 3.7.x	Snowflake Enterprise Library 1.0.1

Installing with Package Manager

You can use Package Manager to install the Snowflake stage library on a tarball Data Collector installation.

Click the Package Manager icon: .
In the Navigation panel, click Enterprise Stage Libraries.
Select Snowflake Enterprise Library, then click the Install icon: .
Read the StreamSets subscription terms of service. If you agree, select the checkbox and click Install.
Data Collector installs the selected stage library.
Restart Data Collector.

Installing as a Custom Stage Library

You can install the Snowflake Enterprise stage library as a custom stage library on a tarball, RPM, or Cloudera Manager Data Collector installation.

To download the stage library, go to the StreamSets Download Enterprise Connectors page.
The web page displays the Enterprise stage libraries organized by release date, with the latest versions at the top of the page.
Click the Enterprise stage library name and version that you want to download.
In the Download Enterprise Connectors form, enter your name and contact information.
Read the StreamSets subscription terms of service. If you agree, accept the terms of service and click Submit.
The stage library downloads.
Install and manage the Enterprise stage library as a custom stage library.
For more information, see Custom Stage Libraries.

Create a Snowflake External Stage

To write data to Snowflake, the Snowflake destination stages data to one of the Snowflake cloud service providers, Amazon S3 or Microsoft Azure. Before using the destination in a pipeline, you must create a Snowflake external stage.

The Snowflake destination writes files to the Amazon S3 bucket or Microsoft Azure Blob Storage container where you create the Snowflake external stage. Create the external stage with the cloud service provider that hosts your Snowflake warehouse:

Amazon S3

To stage data in Amazon S3, create the Snowflake external stage in a bucket in the same S3 region that hosts your Snowflake virtual warehouse. For example, if your Snowflake warehouse is in AWS US West, then create the Snowflake external stage in a bucket in the AWS US West region.

When you create a Snowflake external stage, you specify a URL that defines the name and location for the stage. Use a trailing slash in the URL to ensure that Snowflake loads all staged data. You might also include a prefix in the stage name to indicate that the external stage is for Data Collector.

For example, the following URL creates an external stage named sdc-externalstage in s3://mybucket/ and loads all staged data to Snowflake:

s3://mybucket/sdc-externalstage/

You can create an S3 stage using the Snowflake web interface or SQL. For more information, see Creating an S3 Stage in the Snowflake documentation.

Microsoft Azure

To stage data in Microsoft Azure, complete the following tasks:

Configure Snowflake authentication for the Microsoft Azure Blob Storage container that you want to use.
You can use an SAS token or an Azure account name and key for authentication. For information about configuring SAS token authentication, see Configuring an Azure Container for Loading Data.
Create a Snowflake external stage in the container.
When you create a Snowflake external stage, you specify a URL that defines the name and location for the stage. Use a trailing slash in the URL to ensure that Snowflake loads all staged data. You might also include a prefix in the stage name to indicate that the external stage is for Data Collector.
For example, the following URL creates an external stage named sdc-externalstage in azure://myaccount.blob.core.windows.net/mycontainer/load/ and loads all staged data to Snowflake:
```
azure://myaccount.blob.core.windows.net/mycontainer/load/sdc-externalstage/
```
You can create an Azure stage using the Snowflake web interface or SQL. For more information, see Creating an Azure Stage in the Snowflake documentation.

Snowpipe Prerequisites

When processing new data, you can use Snowpipe, the Snowflake continuous ingestion engine, to load data to Snowflake tables. You cannot use Snowpipe to process CDC data.

Before using Snowpipe, complete the following prerequisites:

In Snowflake, create a pipe for Snowpipe to use to load data.
For more information, see Create a Pipe in the Snowflake documentation.
In Snowflake, generate a private key PEM and a public key PEM.
For details, see Using Key-Pair Authentication in the Snowflake documentation. You do not need to generate JSON Web Tokens (JWT) as described in Step 5.
In Snowflake, assign the public key to the Snowflake user account configured in the stage.
You can use the Snowflake console or the ALTER USER command.
Optionally, to secure the private key PEM and password, use runtime resources or credential stores.
When you configure the destination, enter the private key PEM and password, and the public key PEM. Or, use a runtime resources or credential store function to reference the information.

Load Methods

The Snowflake destination can load data to Snowflake using the following methods:

COPY command for new data: The COPY command, the default load method, performs a bulk synchronous load to Snowflake, treating all records as INSERTS. Use this method to write new data to Snowflake tables.; The COPY command provides real-time access to data as it is written. It does, however, incur Snowflake warehouse usage fees, which are rounded up to the hour at this time. Use the recommended guidelines to optimize for performance and cost-effectiveness.; Since the COPY command is the default load method, you do not need to configure the Snowflake destination to use this command.
Snowpipe for new data: Snowpipe, the Snowflake continuous ingestion service, performs an asynchronous load to Snowflake, treating all records as INSERTS. Use this method to write new data to Snowflake tables. When needed, you can configure the destination to use a custom Snowflake endpoint.; Snowpipe provides slightly delayed access to data, typically under one minute. Snowpipe incurs Snowflake fees for only the resources used to perform the write.; Before using Snowpipe, perform the prerequisite steps. Also, use the recommended guidelines to optimize for performance and cost-effectiveness.; To use Snowpipe to load new data, enable the Use Snowpipe property on the Snowflake tab of the destination. Then, configure the properties on the Snowpipe tab.
MERGE command for CDC data: Like the COPY command, the MERGE command performs a bulk synchronous load to Snowflake. But instead of treating all records as INSERT, it inserts, updates, and deletes records as appropriate. Use this method to write change data capture (CDC) data to Snowflake tables using CRUD operations.; Also like the COPY command, the MERGE command provides real-time access to data as it is written. And, it incurs Snowflake warehouse usage fees, which are rounded up to the hour at this time.; Use the recommended guidelines to optimize for performance and cost-effectiveness.; Important: To maintain the original order of data, do not use multiple threads or cluster execution mode when processing CDC data.; To use the MERGE command to load CDC data, select the CDC Data property on the Data tab of the destination. Then, enter the Snowflake columns to use as key columns.

For more information about Snowpipe or the COPY or MERGE commands, see the Snowflake documentation.

Performance Optimization

Use the following tips to optimize for performance and cost-effectiveness when using the Snowflake destination:

Increase the batch size

The maximum batch size is determined by the origin in the pipeline and typically has a default value of 1,000 records. To take advantage of Snowflake's bulk loading abilities, increase the maximum batch size in the pipeline origin to 20,000-50,000 records. Be sure to increase the Data Collector java heap size, as needed.

Important: Increasing the batch size is strongly recommended. Using the default batch size can be slow and costly.

Use multiple threads

When writing to Snowflake using Snowpipe or the COPY command, you can use multiple threads to improve performance when you include a multithreaded origin in the pipeline. When Data Collector resources allow, using multiple threads enables processing multiple batches of data concurrently.

As with increasing the batch size, when using multiple threads, you should make sure that the Data Collector java heap size is sized appropriately.

Note: Do not use multiple threads to write CDC data to Snowflake with the MERGE command. When using multiple threads to process data, the original order of the data is not retained.

Enable additional connections to Snowflake

When writing to multiple Snowflake tables using the COPY or MERGE commands, increase the number of connections that the Snowflake destination makes to Snowflake. Each additional connection allows the destination to write to an additional table, concurrently.

For example, when writing to 10 tables with only one connection, the destination can only write to one table at a time. With 5 connections, the destination can write to 5 tables at a time. 10 connections enables writing to all 10 tables at the same time.

By default, the destination uses one connection for standard single-threaded pipelines. In multithreaded pipelines, the destination matches the number of threads used by the pipeline. That is, when a multithreaded origin is configured to use up to 3 threads, then by default, the Snowflake destination uses 3 connections to write to Snowflake, one for each thread.

Note that the number of connections is for the entire pipeline, not for each thread. So when using multiple threads to write to multiple tables, you can also improve performance by allocating additional connections. For example, when using 3 threads to write to 3 tables, you might increase the number of connections to 9 for maximum throughput.

Use the Connection Pool Size property to specify the maximum number of connections that the Snowflake destination can use. Use this property when writing to Snowflake with the COPY or MERGE commands. Increasing the number of connections does not improve performance when using Snowpipe.

Row Generation

When writing a record to a table, the Snowflake destination includes all record fields in the resulting row, by default. The destination uses the root field, /, as the basis for the resulting row.

You can configure the Row Field property to specify a map or list-map field in the record as the basis for the row. The resulting record includes only the data from the specified map or list-map field and excludes all other record data. Use this functionality when the data that you want to write to Snowflake exists in a single map or list-map field within the record.

If you want to use the root field, but do not want to include all fields in the resulting row, you can configure the destination to ignore all specified first-level fields.

The Snowflake destination converts all map or list-map fields within the specified root field to the Snowflake Variant data type. The Snowflake destination fully supports the Variant data type.

By default, records with missing fields or with invalid data types in fields are treated as error records. You can configure the destination to replace missing fields and data of invalid types with user-defined default values. Then, you specify the default values to use for each data type. You can also configure the destination to replace newline characters in string fields with a replacement character.

Writing to Multiple Tables

You can use the Snowflake destination to write to multiple tables within a Snowflake schema. To write to multiple tables, you specify a field in the record that specifies the table to write to.

For example, say you have Snowflake tables named after departments in your company, such as Operations, Sales, and Marketing. Also, the records being processed have a dept field with matching values. You can configure the Snowflake destination to write records to the various tables by entering the following expression in the Table property: ${record:value('/dept')}.

When using the COPY or MERGE command to load data, you can configure the Snowflake destination to automatically create tables when a new value appears in the specified field. For example, if the dept field suddenly includes an Engineering department, the destination can create a new Engineering table in Snowflake for the new data. For more information, see Creating Columns and Tables for Data Drift.

When using a command to write to multiple Snowflake tables, you might also increase the number of connections that the destination uses for the write. For more information, see Performance Optimization.

Creating Columns and Tables for Data Drift

You can configure the Snowflake destination to automatically compensate for changes in column or table requirements, also known as data drift.

When enabled for data drift, the Snowflake destination creates new columns in Snowflake tables when new fields appear in records. For example, if a record suddenly includes a new Address2 field, the destination creates a new Address2 column in the target table.

By default, the destination creates new columns based on the data in the new fields, such as creating a Double column for decimal data. You can, however, configure the destination to create all new columns as Varchar.

When data drift is enabled, you can also configure the destination to create new tables as needed. For example, say the destination writes data to tables based on the region name in the Region field. When a new SW-3 region shows up in a record, the destination creates a new SW-3 table in Snowflake and writes the record to the new table.

You can use this functionality to create all necessary tables in an empty Snowflake database schema.

Note: Due to a Snowflake limitation, the destination cannot create tables when using Snowpipe to load data to Snowflake. The destination can create tables only when using the COPY or MERGE commands to load data.

To enable the automatic creation of new columns, select the Data Drift Enabled property on the Snowflake tab. Then, to enable the creation of new tables, select the Table Auto Create property.

Generated Data Types

When creating new tables or creating new columns in existing tables, the Snowflake destination uses field names to generate the new column names.

You can configure the destination to create all new columns as Varchar. However, by default, the Snowflake destination creates columns as follows:

Record Field Data Type	Snowflake Column Data Type
Byte Array	Binary
Char	Char
String	Varchar
Byte, Integer, Long, Short	Number
Decimal, Double, Float	Double
Boolean	Boolean
Date	Date
Datetime	Timestampntz
Time	Time
Zoned Datetime	Timestamptz
Map, List-Map	Variant

The Snowflake destination fully supports the Variant data type.

Define the CRUD Operation

The Snowflake destination can insert, update, or delete data when you configure the destination to process CDC data. When processing CDC data, the destination uses the MERGE command to write data to Snowflake.

When writing data, the Snowflake destination uses the CRUD operation specified in the sdc.operation.type record header attribute. The destination performs operations based on the following numeric values:

1 for INSERT
2 for DELETE
3 for UPDATE

If your pipeline includes a CRUD-enabled origin that processes changed data, the destination simply reads the operation type from the sdc.operation.type header attribute that the origin generates. If your pipeline uses a non-CDC origin, you can use the Expression Evaluator or a scripting processor to define the record header attribute. For more information about Data Collector changed data processing and a list of CDC-enabled origins, see Processing Changed Data.

AWS Credentials

When the Snowflake destination stages data on Amazon S3, it must pass credentials to Amazon Web Services. Use AWS access key pairs to pass the credentials. The Snowflake destination does not support using IAM roles at this time.

To use AWS access key pairs, specify the Access Key ID and Secret Access Key properties in the destination.

Tip: To secure sensitive information such as access key pairs, you can use runtime resources or credential stores.

Configuring a Snowflake Destination

Configure a Snowflake destination to write data to Snowflake tables. Before you use the destination in a pipeline, complete the required prerequisites.

In the Properties panel, on the General tab, configure the following properties:

General Property	Description
Name	Stage name.
Description	Optional description.
Required Fields	Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses. Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions	Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error	Error record handling for the stage: Discard - Discards the record. Send to Error - Sends the record to the pipeline for error handling. Stop Pipeline - Stops the pipeline.

On the Snowflake Connection Info tab, configure the following properties:

Snowflake Connection Property	Description
Snowflake Region	Region where the Snowflake warehouse is located. Select one of the following regions: AWS US West AWS US East AWS EU (Dublin) AWS EU (Frankfurt) AWS Asia Pacific (Sydney) Microsoft Azure East US 2 Microsoft Azure West Europe Other - Use to specify a Snowflake region not listed above. Custom JDBC URL - Use to specify a Virtual Private Snowflake.
Custom Snowflake Region	A custom Snowflake region. Available when using Other as the Snowflake region.
Virtual Private Snowflake URL	The custom JDBC URL to use when using a virtual private Snowflake installation.
Account	Snowflake account name.
User	Snowflake user name.
Password	Snowflake password.
Connection Pool Size	The maximum number of connections that the destination uses to write to Snowflake. The default, 0, ensures that the destination uses the same number of connections as threads used by the pipeline. When writing to multiple tables using the COPY or MERGE command, increasing this property can improve performance.
Connection Properties	Additional Snowflake connection properties to use. To add properties, click Add and define the property name and value. Use the property names and values as expected by Snowflake.

On the Snowflake tab, configure the following properties:

Snowflake	Description
Warehouse	Snowflake warehouse.
Database	Snowflake database.
Schema	Snowflake schema.
Table	Snowflake tables to write to. To write to a single table, enter the table name. To write to multiple tables, enter an expression that evaluates to the field in the record that contains the table name. For example: `${record:value('/table')}`
Use Snowpipe	Enables using Snowpipe to write to Snowflake. Use only when processing new data. You cannot use Snowpipe to load CDC data or when autocreating tables. Perform the necessary prerequisites before enabling Snowpipe. For more information about Snowpipe and other load methods, see Load Methods. For information about optimizing pipeline performance, see Performance Optimization.
Upper Case Schema & Field Names	Converts all schema, table, and field names to all uppercase letters. When enabled, any new tables or fields created by the destination also use all uppercase letters.
Data Drift Enabled	Enables the destination to create new columns in Snowflake tables when new fields appear in records.
Table Auto Create	Enables automatically creating tables when needed. Available when the Data Drift property is enabled and the Use Snowpipe property is disabled. Since Snowpipe can only write to new tables, the destination does not create new tables when loading data using Snowpipe. If this property does not display, clear the Use Snowpipe property.
Create New Columns as Varchar	Enables the destination to create all new columns as Varchar. By default, the destination creates new columns based on the type of data in the field.

When loading data with Snowpipe, on the Snowpipe tab, configure the following properties:

Snowpipe Property	Description
Pipe	The pipe to use when using Snowpipe to load data to Snowflake. Use Snowpipe only when processing new data. Be sure to complete all of the Snowpipe prerequisite tasks before configuring these Snowpipe properties.
Private Key PEM	The private key PEM. Generated in Snowflake as part of the Snowpipe prerequisite tasks. To secure sensitive information, you can use runtime resources or credential stores.
Private Key Password	The private key password. Generated in Snowflake as part of the Snowpipe prerequisite tasks.
Public Key PEM	The public key PEM. Generated in Snowflake as part of the Snowpipe prerequisite tasks.
Use Custom Snowpipe Endpoint	Enables using a custom Snowpipe endpoint.
Custom Snowpipe Protocol	Protocol for the custom Snowpipe endpoint: HTTP HTTPS
Custom Snowpipe Host	Host name for the custom Snowpipe endpoint.
Custom Snowpipe Port	Port number for the custom Snowpipe endpoint.

On the Staging tab, configure the following properties:

Staging Property	Description
External Stage Name	Name of the Snowflake external stage to stage the data. The stage is created as part of the Snowflake prerequisite tasks.
External Stage Location	Location of the Snowflake external stage: Amazon S3 Azure Blob Storage This property configuration determines the properties that display on this tab and the Staging Advanced tab.
Purge Stage File After Ingesting	Removes a stage file after its data is written to Snowflake. Do not use when writing to Snowflake with Snowpipe.
AWS Access Key ID	AWS access key ID. Required when writing to an external stage on Amazon S3.
AWS Secret Key ID	AWS secret access key. Required when writing to an external stage on Amazon S3.
S3 Stage File Name Prefix	Optional prefix for the external stage name.
S3 Compressed File	Enables compressing the file before writing it to S3. Keep this option enabled for optimum performance.
Azure Authentication	Type of authentication to use to connect to Azure: Account Name and Key SAS Token
Azure Account Name	Azure account name.
Azure Account Key	Azure account key. Used only for Account Name and Key authentication. To secure sensitive information, you can use runtime resources or credential stores.
Azure SAS Token	Azure SAS Token. Used only for SAS Token authentication. To secure sensitive information, you can use runtime resources or credential stores.
Azure Stage File Name Prefix	Optional prefix for the external stage name.
Azure Compressed File	Enables compressing the file before writing it to Azure. Keep this option enabled for optimum performance.

On the Staging Advanced tab, configure the following properties:

This tab displays different properties depending on how you configure the External Stage Location property on the Staging tab.

Staging Advanced Property	Description
S3 Connection Timeout	Seconds to wait for a response before closing the connection. Default is 10 seconds.
S3 Socket Timeout	Seconds to wait for a response to a query.
S3 Max Error Retry	Maximum number of times to retry requests.
S3 Uploading Threads	Size of the thread pool for parallel uploads. Used when writing to multiple partitions and writing large objects in multiple parts. When writing to multiple partitions, setting this property up to the number of partitions being written to can improve performance. For more information about this and the following properties, see the Amazon S3 TransferManager documentation.
S3 Minimum Upload Part Size (MB)	Minimum part size in bytes for multipart uploads.
S3 Multipart Upload Threshold (MB)	Minimum batch size in bytes for the destination to use multipart uploads.
S3 Proxy Enabled	Specifies whether to use a proxy to connect.
S3 Proxy Host	Proxy host.
S3 Proxy Port	Proxy port.
S3 Proxy Authentication Enabled	Indicates that proxy authentication is used.
S3 Proxy User	S3 proxy user.
S3 Proxy Password	S3 proxy password.
S3 Encryption	Option that Amazon S3 uses to manage the encryption keys: None SSE-S3 - Use Amazon S3-managed keys. SSE-KMS - Use Amazon Web Services KMS-managed keys. Default is None.
S3 Encryption KMS ID	Amazon resource name (ARN) of the AWS KMS master encryption key. Use the following format: `<arn>:<aws>:<kms>:<region>:<acct ID>:<key>/<key ID>` Used for SSE-KMS encryption only.
S3 Encryption Context	Key-value pairs to use for the encryption context. Click Add to add key-value pairs. Used for SSE-KMS encryption only.
Use Custom Blob Service URL	Enables using a custom Azure Blob Storage URL.
Custom Blob Service URL	Custom Azure Blob Storage URL. Typically uses the following format: `https://<Azure Account>.blob.core.windows.net`
Azure Encryption	Enables using Azure default encryption at this time.

On the Data tab, configure the following properties:

Data Property	Description
Row Field	Map or list-map field to use as the basis for the generated row. Default is `/`, which includes all record fields in the resulting row.
Column Fields to Ignore	A list of fields to ignore when writing to the destination. You can enter a comma-separated list of first level fields to ignore.
Null Value	Characters to use to represent null values. Default is `\N`, the null value characters for Snowflake.
CDC Data	Enables performing CRUD operations and using the MERGE command to write to Snowflake tables. Select to process CDC data. Cannot be used when writing data with Snowpipe. Important: To maintain the original order of data, do not use multiple threads or cluster execution mode when processing CDC data. For more information about the MERGE command and other load methods, see Load Methods. For information about optimizing pipeline performance, see Performance Optimization.
Table Key Columns	The key columns to use for each Snowflake table. Click the Add icon to add additional tables. Click the Add icon in the Key Column field to add additional key columns for a table.

On the Data Advanced tab, configure the following properties:

Data Advanced Property	Description
Snowflake File Format	Allows the use of custom Snowflake file formats. Should not be used unless recommended by StreamSets customer support.
Ignore Missing Fields	Allows writing records with missing fields to Snowflake tables. Uses the specified default value for the data type of the missing field. When not enabled, records with missing fields are treated as error records.
Ignore Fields with Invalid Types	Allows replacing fields that contain data of an invalid type with the specified default value for the data type. When not enabled, records with data of invalid types are treated as error records.
Boolean Default	The default value to use when replacing missing Boolean fields or Boolean fields with invalid data. Default is \N, which represents a null value in Snowflake.
Char Default	The default value to use when replacing missing Char fields or Char fields with invalid data. Default is \N, which represents a null value in Snowflake.
Number Default	The default value to use when replacing missing Number fields or Number fields with invalid data. Default is \N, which represents a null value in Snowflake.
Double Default	The default value to use when replacing missing Double fields or Double fields with invalid data. Default is \N, which represents a null value in Snowflake.
Date Default	The default value to use when replacing missing Date fields or Date fields with invalid data. Default is \N, which represents a null value in Snowflake.
Timestampntz Default	The default value to use when replacing missing Timestampntz fields or Timestampntz fields with invalid data. Default is \N, which represents a null value in Snowflake.
Timestamptz Default	The default value to use when replacing missing Timestamptz fields or Timestamptz fields with invalid data. Default is \N, which represents a null value in Snowflake.
Time Default	The default value to use when replacing missing Time fields or Time fields with invalid data. Default is \N, which represents a null value in Snowflake.
Varchar Default	The default value to use when replacing missing Varchar fields or Varchar fields with invalid data. Default is \N, which represents a null value in Snowflake.
Binary Default	The default value to use when replacing missing Binary fields or Binary fields with invalid data. Default is \N, which represents a null value in Snowflake.
Variant Default	The default value to use when replacing missing Variant fields or Variant fields with invalid data. Default is \N, which represents a null value in Snowflake.
Replace Newlines	Replaces newline characters in string fields with the specified replacement character.
New Line Replacement Character	The character to use to replace newline characters.