Data Collector User Guide

Home
Processors

Getting Started
What's New
Installation
Configuration
Upgrade
Pipeline Concepts and Design
Pipeline Configuration
Data Formats
Origins
Processors
- Processors
- Aggregator
- Base64 Field Decoder
  The Base64 Field Decoder decodes Base64 encoded data to binary data. Use the processor to decode Base64 encoded data before evaluating data in the field.
- Base64 Field Encoder
  The Base64 Field Encoder encodes binary data using Base64. Use the processor to encode binary data that must be sent over channels that expect ASCII data.
- Data Generator
- Data Parser
  The Data Parser processor allows you to parse supported data formats embedded in a field. You can parse NetFlow embedded in a byte array field or syslog messages embedded in a string field.
- Databricks ML Evaluator
  The Databricks ML Evaluator processor uses a machine learning model exported with Databricks ML Model Export to generate evaluations, scoring, or classifications of data. This processor is a Technology Preview feature. It is not meant for use in production.
- Delay
- Encrypt and Decrypt Fields
  The Encrypt and Decrypt Fields processor encrypts or decrypts field values.
- Expression Evaluator
- Field Flattener
- Field Hasher
  The Field Hasher uses an algorithm to encode data. Use Field Hasher to encode highly-sensitive data. For example, you might use Field Hasher to encode social security or credit card numbers.
- Field Masker
  The Field Masker masks string values based on the selected mask type. You can use variable-length, fixed-length, custom, or regular expression masks. Custom masks can reveal part of the string value.
- Field Merger
  The Field Merger merges one or more fields in a record to a different location in the record. Use only for records with a list or map structure.
- Field Order
  The Field Order processor orders fields in a map or list-map field and outputs the fields into a list-map or list root field.
- Field Pivoter
- Field Remover
- Field Renamer
  Use the Field Renamer to rename fields in a record. You can specify individual fields to rename or use regular expressions to rename sets of fields.
- Field Replacer
  The Field Replacer replaces values in fields with nulls or with new values. Use the Field Replacer to update values or to replace invalid values.
- Field Splitter
  The Field Splitter splits string data based on a regular expression and passes the separated data to new fields. Use the Field Splitter to split complex string values into logical components.
- Field Type Converter
  The Field Type Converter processor converts the data types of fields to compatible data types. You might use the processor to convert the data types of fields before performing calculations. You can also use the processor to change the scale of decimal data.
- Field Zip
- Geo IP
- Groovy Evaluator
- HBase Lookup
  The HBase Lookup processor performs key-value lookups in HBase and passes the lookup values to fields. Use the HBase Lookup to enrich records with additional data.
- Hive Metadata
- HTTP Client
- HTTP Router
  The HTTP Router processor passes records to data streams based on the HTTP method and URL path in the record header attributes.
- JavaScript Evaluator
- JDBC Lookup
  The JDBC Lookup processor uses a JDBC connection to perform lookups in a database table and pass the lookup values to fields. Use the JDBC Lookup to enrich records with additional data.
- JDBC Tee
- JSON Generator
- JSON Parser
- Jython Evaluator
- Kudu Lookup
  The Kudu Lookup processor performs lookups in a Kudu table and passes the lookup values to fields. Use the Kudu Lookup to enrich records with additional data.
- Log Parser
- MLeap Evaluator
  The MLeap Evaluator processor uses a machine learning model stored in an MLeap bundle to generate evaluations, scoring, or classifications of data. This processor is a Technology Preview feature. It is not meant for use in production.
- MongoDB Lookup
  The MongoDB Lookup processor performs lookups in MongoDB and passes all values from the returned document to a new list-map field in the record. Use the MongoDB Lookup processor to enrich records with additional data.
- PMML Evaluator
  The PMML Evaluator processor uses a machine learning model stored in the Predictive Model Markup Language (PMML) format to generate predictions or classifications of data. This processor is a Technology Preview feature. It is not meant for use in production.
- PostgreSQL Metadata
- Record Deduplicator
  The Record Deduplicator evaluates records for duplicate data and routes data to two streams - one for unique records and one for duplicate records. Use the Record Deduplicator to discard duplicate data or route duplicate data through different processing logic.
- Redis Lookup
  The Redis Lookup processor performs key-value lookups in Redis and passes the lookup values to fields. Use the Redis Lookup to enrich records with additional data.
- Salesforce Lookup
  The Salesforce Lookup processor performs lookups in a Salesforce object and passes the lookup values to fields. Use the Salesforce Lookup to enrich records with additional data.
- Schema Generator
- Spark Evaluator
  The Spark Evaluator performs custom processing within a pipeline based on a Spark application that you develop.
- SQL Parser
- Static Lookup
  The Static Lookup processor performs lookups of key-value pairs that are stored in local memory and passes the lookup values to fields. Use the Static Lookup to store String values in memory that the pipeline can look up at runtime to enrich records with additional data.
- Stream Selector
  The Stream Selector passes data to streams based on conditions. Define a condition for each stream of data that you want to create. The Stream Selector uses a default stream to pass records that do not match user-defined conditions.
- TensorFlow Evaluator
  The TensorFlow Evaluator processor uses a TensorFlow machine learning model to generate predictions or classifications of data. The TensorFlow Evaluator processor is a Technology Preview feature. It is not meant for use in production.
- Value Replacer (Deprecated)
- Whole File Transformer
  The Whole File Transformer processor transforms fully written Avro files to highly efficient, columnar Parquet files. Use the Whole File Transformer in a pipeline that reads Avro files as whole files and writes the transformed Parquet files as whole files.
- XML Flattener
- XML Parser
Destinations
Executors
StreamSets Data Collector Edge
StreamSets Control Hub
Dataflow Triggers
Drift Synchronization Solution for Hive
Drift Synchronization Solution for PostgreSQL
Multithreaded Pipelines
Microservice Pipelines
SDC RPC Pipelines
Cluster Pipelines
Data Preview
Rules and Alerts
Pipeline Monitoring
Pipeline Maintenance
Administration
Tutorial
Troubleshooting
Glossary
Data Formats by Stage
Expression Language
Regular Expressions
Grok Patterns

Processors

Processors
Aggregator
Base64 Field Decoder
The Base64 Field Decoder decodes Base64 encoded data to binary data. Use the processor to decode Base64 encoded data before evaluating data in the field.
Base64 Field Encoder
The Base64 Field Encoder encodes binary data using Base64. Use the processor to encode binary data that must be sent over channels that expect ASCII data.
Data Generator
Data Parser
The Data Parser processor allows you to parse supported data formats embedded in a field. You can parse NetFlow embedded in a byte array field or syslog messages embedded in a string field.
Databricks ML Evaluator
The Databricks ML Evaluator processor uses a machine learning model exported with Databricks ML Model Export to generate evaluations, scoring, or classifications of data. This processor is a Technology Preview feature. It is not meant for use in production.
Delay
Encrypt and Decrypt Fields
The Encrypt and Decrypt Fields processor encrypts or decrypts field values.
Expression Evaluator
Field Flattener
Field Hasher
The Field Hasher uses an algorithm to encode data. Use Field Hasher to encode highly-sensitive data. For example, you might use Field Hasher to encode social security or credit card numbers.
Field Masker
The Field Masker masks string values based on the selected mask type. You can use variable-length, fixed-length, custom, or regular expression masks. Custom masks can reveal part of the string value.
Field Merger
The Field Merger merges one or more fields in a record to a different location in the record. Use only for records with a list or map structure.
Field Order
The Field Order processor orders fields in a map or list-map field and outputs the fields into a list-map or list root field.
Field Pivoter
Field Remover
Field Renamer
Use the Field Renamer to rename fields in a record. You can specify individual fields to rename or use regular expressions to rename sets of fields.
Field Replacer
The Field Replacer replaces values in fields with nulls or with new values. Use the Field Replacer to update values or to replace invalid values.
Field Splitter
The Field Splitter splits string data based on a regular expression and passes the separated data to new fields. Use the Field Splitter to split complex string values into logical components.
Field Type Converter
The Field Type Converter processor converts the data types of fields to compatible data types. You might use the processor to convert the data types of fields before performing calculations. You can also use the processor to change the scale of decimal data.
Field Zip
Geo IP
Groovy Evaluator
HBase Lookup
The HBase Lookup processor performs key-value lookups in HBase and passes the lookup values to fields. Use the HBase Lookup to enrich records with additional data.
Hive Metadata
HTTP Client
HTTP Router
The HTTP Router processor passes records to data streams based on the HTTP method and URL path in the record header attributes.
JavaScript Evaluator
JDBC Lookup
The JDBC Lookup processor uses a JDBC connection to perform lookups in a database table and pass the lookup values to fields. Use the JDBC Lookup to enrich records with additional data.
JDBC Tee
JSON Generator
JSON Parser
Jython Evaluator
Kudu Lookup
The Kudu Lookup processor performs lookups in a Kudu table and passes the lookup values to fields. Use the Kudu Lookup to enrich records with additional data.
Log Parser
MLeap Evaluator
The MLeap Evaluator processor uses a machine learning model stored in an MLeap bundle to generate evaluations, scoring, or classifications of data. This processor is a Technology Preview feature. It is not meant for use in production.
MongoDB Lookup
The MongoDB Lookup processor performs lookups in MongoDB and passes all values from the returned document to a new list-map field in the record. Use the MongoDB Lookup processor to enrich records with additional data.
PMML Evaluator
The PMML Evaluator processor uses a machine learning model stored in the Predictive Model Markup Language (PMML) format to generate predictions or classifications of data. This processor is a Technology Preview feature. It is not meant for use in production.
PostgreSQL Metadata
Record Deduplicator
The Record Deduplicator evaluates records for duplicate data and routes data to two streams - one for unique records and one for duplicate records. Use the Record Deduplicator to discard duplicate data or route duplicate data through different processing logic.
Redis Lookup
The Redis Lookup processor performs key-value lookups in Redis and passes the lookup values to fields. Use the Redis Lookup to enrich records with additional data.
Salesforce Lookup
The Salesforce Lookup processor performs lookups in a Salesforce object and passes the lookup values to fields. Use the Salesforce Lookup to enrich records with additional data.
Schema Generator
Spark Evaluator
The Spark Evaluator performs custom processing within a pipeline based on a Spark application that you develop.
SQL Parser
Static Lookup
The Static Lookup processor performs lookups of key-value pairs that are stored in local memory and passes the lookup values to fields. Use the Static Lookup to store String values in memory that the pipeline can look up at runtime to enrich records with additional data.
Stream Selector
The Stream Selector passes data to streams based on conditions. Define a condition for each stream of data that you want to create. The Stream Selector uses a default stream to pass records that do not match user-defined conditions.
TensorFlow Evaluator
The TensorFlow Evaluator processor uses a TensorFlow machine learning model to generate predictions or classifications of data. The TensorFlow Evaluator processor is a Technology Preview feature. It is not meant for use in production.
Value Replacer (Deprecated)
Whole File Transformer
The Whole File Transformer processor transforms fully written Avro files to highly efficient, columnar Parquet files. Use the Whole File Transformer in a pipeline that reads Avro files as whole files and writes the transformed Parquet files as whole files.
XML Flattener
XML Parser

© Apache License, Version 2.0.