Start Pipeline

Supported pipeline types: Data Collector

Upon receipt of a record from its parent pipeline, the Start Pipeline processor starts one or more pipelines in parallel.

The Start Pipeline processor is an orchestrator stage. The pipeline that contains the stage is an orchestration pipeline. Orchestrator stages schedule and arrange tasks that complete workflows through the orchestration pipeline. For more information, see Orchestration Pipeline Overview. For example, an orchestration pipeline can use the Cron Scheduler origin to generate a record every Monday at 6 AM and trigger the Start Pipeline processor, which starts a pipeline that loads data from the previous week and generates a report.

The Start Pipeline processor can start pipelines that run on any StreamSets execution engine, such as Data Collector, Data Collector Edge, or Transformer. The processor takes the received record and adds a list of the started pipelines and a field that indicates whether the pipelines finished successfully. Subsequent stages in the orchestration pipeline can use this field to determine the next task. For example, a Stream Selector processor might use the finished status to determine the tasks to complete next.

When you configure the Start Pipeline processor, you specify the URL of the execution engine that runs the pipelines, and you specify the IDs of the pipelines to start along with any runtime parameters to use. For an execution engine registered with Control Hub, you specify the Control Hub URL and the processor starts the pipelines through Control Hub.

You can configure the processor to restart the origins in the pipelines when possible. You can also configure the processor to start the pipelines in the background. After starting pipelines in the background, the processor immediately updates and passes the received record to the next stage rather than waiting for the pipelines to completely finish.

You also configure the user name and password to run the pipeline and can optionally configure SSL/TLS properties.

Data Flow

The data flow in the orchestration pipeline that contains the Start Pipeline processor depends on whether the processor runs the started pipelines in the background.

When processing a record, the Start Pipeline processor starts the specified pipelines. To the processed record, the Start Pipeline processor adds a list map of the started pipelines and a field that indicates whether those pipelines finished successfully. A pipeline finished successfully once its state is FINISHED.

The processor updates and passes the received record to the next stage in the orchestration pipeline either immediately after the pipelines start or after the pipelines finish, depending on whether the pipelines run in the background. Choose whether to run the pipelines in the background based on the data flow needed in the orchestration pipeline.

Pipelines Run in Background

When running the started pipelines in the background, the processor updates and passes the processed record to the next stage in the orchestration pipeline immediately after starting the pipelines. The processed record contains the list map of started pipelines and the finished status of the pipelines. The processor and orchestration pipeline do not track whether the started pipelines finish successfully. Therefore, the finished status always indicates unsuccessful for pipelines run in the background.

In this case, you can run other stages in parallel and complete tasks simultaneously. Because the processor does not generate another record when the started pipelines finish, other stages in the orchestration pipeline cannot depend on the completion of or values from the started pipelines.

Pipelines Not Run in Background

When not running the started pipelines in the background, the processor updates and passes the processed record to the next stage in the orchestration pipeline after all the started pipelines finish. The processed record contains the list map of started pipelines and the finished status of the pipelines.

In this case, a subsequent stage in the orchestration pipeline can depend on the completion of one of the started pipelines or a value from one of the started pipelines. For example, a Stream Selector processor might use the finished status to determine the tasks to complete next.

Configuring a Start Pipeline Processor

Configure a Start Pipeline processor to start a Data Collector, Data Collector Edge, or Transformer pipeline. The Start Pipeline processor is an orchestrator stage.

In the Properties panel, on the General tab, configure the following properties:

General Property	Description
Name	Stage name.
Description	Optional description.
Required Fields	Fields that must include data for the record to be passed into the stage. Tip: You might include fields that the stage uses. Records that do not include all required fields are processed based on the error handling configured for the pipeline.
Preconditions	Conditions that must evaluate to TRUE to allow a record to enter the stage for processing. Click Add to create additional preconditions. Records that do not meet all preconditions are processed based on the error handling configured for the stage.
On Record Error	Error record handling for the stage: Discard - Discards the record. Send to Error - Sends the record to the pipeline for error handling. Stop Pipeline - Stops the pipeline. Not valid for cluster pipelines.

On the Pipeline tab, configure the following properties:

Pipeline Property	Description
Execution Engine URL	URL of the execution engine that runs the pipelines. Execution engines include Data Collector, Data Collector Edge, and Transformer.
Pipelines	List of pipelines to start in parallel. For each pipeline, enter: Pipeline ID - ID of the pipeline. To find the pipeline ID from the execution engine UI, click the pipeline canvas and then click the General tab in the Properties panel. To find the pipeline ID from Control Hub, expand the pipeline in the Pipelines view and click Show Additional Info. Runtime Parameters - Parameters defined in a pipeline and specified when starting the pipeline. To include another pipeline, click the Add icon. You can use simple or bulk edit mode to specify pipelines.
Reset Origin	Resets the origin before starting a pipeline, if the origin can be reset. For a list of origins that can be reset, see Resetting the Origin.
Control Hub Enabled	Starts pipelines through Control Hub. Select this property when the execution engine is registered with Control Hub.
Control Hub URL	URL of Control Hub where the execution engine is registered: For Control Hub cloud, enter `https://cloud.streamsets.com`. For Control Hub on-premises, enter the URL provided by your system administrator. For example, `https://<hostname>:18631`.
Run in Background	Runs the started pipelines in the background. When running started pipelines in the background, the stage passes the record to the next stage immediately after starting the pipelines. When not running started pipelines in the background, the stage passes the record to the next stage only after the started pipelines completely finish.
Delay Between State Checks	Milliseconds to wait between checks for the completion status of the started pipeline. Available when not running started pipelines in the background.

On the Credentials tab, configure the following properties:

Credentials Property	Description
User Name	User that runs the pipeline. Enter a user name for the execution engine or enter a Control Hub user name if the engine is registered with Control Hub
Password	Password for the user. Tip: To secure sensitive information such as user names and passwords, you can use runtime resources or credential stores.

To use SSL/TLS, click the TLS tab and configure the following properties.

TLS Property	Description
Use TLS	Enables the use of TLS.
Keystore File	Path to the keystore file. Enter an absolute path to the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES. For more information about environment variables, see Data Collector Environment Configuration. By default, no keystore is used.
Keystore Type	Type of keystore to use. Use one of the following types: Java Keystore File (JKS) PKCS #12 (p12 file) Default is Java Keystore File (JKS).
Keystore Password	Password to the keystore file. A password is optional, but recommended. Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores.
Keystore Key Algorithm	Algorithm to manage the keystore. Default is SunX509.
Truststore File	Path to the truststore file. Enter an absolute path to the file or a path relative to the Data Collector resources directory: $SDC_RESOURCES. For more information about environment variables, see Data Collector Environment Configuration. By default, no truststore is used.
Truststore Type	Type of truststore to use. Use one of the following types: Java Keystore File (JKS) PKCS #12 (p12 file) Default is Java Keystore File (JKS).
Truststore Password	Password to the truststore file. A password is optional, but recommended. Tip: To secure sensitive information such as passwords, you can use runtime resources or credential stores.
Truststore Trust Algorithm	Algorithm to manage the truststore. Default is SunX509.
Use Default Protocols	Uses the default TLSv1.2 transport layer security (TLS) protocol. To use a different protocol, clear this option.
Transport Protocols	TLS protocols to use. To use a protocol other than the default TLSv1.2, click the Add icon and enter the protocol name. You can use simple or bulk edit mode to add protocols. Note: Older protocols are not as secure as TLSv1.2.
Use Default Cipher Suites	Uses a default cipher suite for the SSL/TLS handshake. To use a different cipher suite, clear this option.
Cipher Suites	Cipher suites to use. To use a cipher suite that is not a part of the default set, click the Add icon and enter the name of the cipher suite. You can use simple or bulk edit mode to add cipher suites. Enter the Java Secure Socket Extension (JSSE) name for the additional cipher suites that you want to use.