An origin stage represents the source for the pipeline. You can use a single origin stage in a pipeline.
Constrained Application Protocol (CoAP) is a web transfer protocol designed for machine-to-machine devices. The CoAP Server origin is a multithreaded origin that listens on a CoAP endpoint and processes the contents of all authorized CoAP requests.
The Directory origin reads data from files in a directory. The origin can use multiple threads to enable the parallel processing of files.
The Elasticsearch origin is a multithreaded origin that reads data from an Elasticsearch cluster, including Elastic Cloud clusters (formerly Found clusters). The origin generates a record for each Elasticsearch document.
The File Tail origin reads lines of data as they are written to an active file after reading related archived files in the same directory. File Tail generates a record for each line of data.
The Google BigQuery origin executes a query job and reads the result from Google BigQuery.
The Google Pub/Sub Subscriber origin consumes messages from a Google Pub/Sub subscription.
The Hadoop FS origin reads data from the Hadoop Distributed File System (HDFS), Amazon S3, or other file systems using the Hadoop FileSystem interface.
The Hadoop FS Standalone origin reads files in HDFS. The origin can use multiple threads to enable the parallel processing of files. The files to be processed must all share a file name pattern and be fully written. You can also configure the origin to read from Azure HDInsight.
The HTTP Server origin is a multithreaded origin that listens on an HTTP endpoint and processes the contents of all authorized HTTP POST and PUT requests. Use the HTTP Server origin to read high volumes of HTTP POST and PUT requests using multiple threads.
The JMS Consumer origin reads data from a Java Messaging Service (JMS).
The Kinesis Consumer origin reads data from Amazon Kinesis Streams.
The MapR DB JSON origin reads JSON documents from MapR DB JSON tables. The origin converts each document into a record.
The MapR FS origin reads files from MapR FS. Use this origin only in pipelines configured for cluster batch pipeline execution mode.
The MapR FS Standalone origin reads files in MapR. The origin can use multiple threads to enable the parallel processing of files. The files to be processed must all share a file name pattern and be fully written.
The MapR Streams Consumer origin reads messages from MapR Streams.
The MQTT Subscriber origin subscribes to topics on an MQTT broker to read messages from the broker. The origin functions as an MQTT client that receives messages, generating a record for each message.
The Omniture origin processes JSON website usage reports generated by the Omniture reporting APIs. Omniture is also known as the Adobe Marketing Cloud.
The PostgreSQL CDC Client origin processes Write-Ahead Logging (WAL) data to generate change data capture records for a PostgreSQL database. Use the PostgreSQL CDC Client origin to process WAL data from PostgreSQL 9.4 or later. Earlier versions do not support WAL.
RabbitMQ Consumer reads AMQP messages from a single RabbitMQ queue.
The Redis Consumer origin reads messages from Redis.
The SDC RPC origin enables connectivity between two SDC RPC pipelines. The SDC RPC origin reads data passed from an SDC RPC destination. Use the SDC RPC origin as part of an SDC RPC destination pipeline.
The SFTP/FTP Client origin reads files from a server using the Secure File Transfer Protocol (SFTP) or the File Transfer Protocol (FTP).
The System Metrics origin reads system metrics from the edge device where StreamSets Data Collector Edge (SDC Edge) is installed. Use the System Metrics origin only in pipelines configured for edge execution mode.
The WebSocket Client origin reads data from a WebSocket server endpoint. Use the origin to read data from a WebSocket resource URL.
The WebSocket Server origin is a multithreaded origin that listens on a WebSocket endpoint and processes the contents of all authorized WebSocket client requests. Use the WebSocket Server origin to read high volumes of WebSocket client requests using multiple threads.
The Windows Event Log origin reads data from a Microsoft Windows event log located on a Windows machine. The origin generates a record for each event in the log.