Start Pipeline
Supported pipeline types:
|
The Start Pipeline processor is an orchestrator stage. The pipeline that contains the stage is an orchestration pipeline. Orchestrator stages schedule and arrange tasks that complete workflows through the orchestration pipeline. For more information, see Orchestration Pipeline Overview. For example, an orchestration pipeline can use the Cron Scheduler origin to generate a record every Monday at 6 AM and trigger the Start Pipeline processor, which starts a pipeline that loads data from the previous week and generates a report.
The Start Pipeline processor can start pipelines that run on any StreamSets execution engine, such as Data Collector, Data Collector Edge, or Transformer. The processor takes the received record and adds a list of the started pipelines and a field that indicates whether the pipelines finished successfully. Subsequent stages in the orchestration pipeline can use this field to determine the next task. For example, a Stream Selector processor might use the finished status to determine the tasks to complete next.
When you configure the Start Pipeline processor, you specify the URL of the execution engine that runs the pipelines, and you specify the IDs of the pipelines to start along with any runtime parameters to use. For an execution engine registered with Control Hub, you specify the Control Hub URL and the processor starts the pipelines through Control Hub.
You can configure the processor to restart the origins in the pipelines when possible. You can also configure the processor to start the pipelines in the background. After starting pipelines in the background, the processor immediately updates and passes the received record to the next stage rather than waiting for the pipelines to completely finish.
You also configure the user name and password to run the pipeline and can optionally configure SSL/TLS properties.
Data Flow
The data flow in the orchestration pipeline that contains the Start Pipeline processor depends on whether the processor runs the started pipelines in the background.
When processing a record, the Start Pipeline processor starts the specified pipelines. To the processed record, the Start Pipeline processor adds a list map of the started pipelines and a field that indicates whether those pipelines finished successfully. A pipeline finished successfully once its state is FINISHED.
The processor updates and passes the received record to the next stage in the orchestration pipeline either immediately after the pipelines start or after the pipelines finish, depending on whether the pipelines run in the background. Choose whether to run the pipelines in the background based on the data flow needed in the orchestration pipeline.
Pipelines Run in Background
When running the started pipelines in the background, the processor updates and passes the processed record to the next stage in the orchestration pipeline immediately after starting the pipelines. The processed record contains the list map of started pipelines and the finished status of the pipelines. The processor and orchestration pipeline do not track whether the started pipelines finish successfully. Therefore, the finished status always indicates unsuccessful for pipelines run in the background.
In this case, you can run other stages in parallel and complete tasks simultaneously. Because the processor does not generate another record when the started pipelines finish, other stages in the orchestration pipeline cannot depend on the completion of or values from the started pipelines.
Pipelines Not Run in Background
When not running the started pipelines in the background, the processor updates and passes the processed record to the next stage in the orchestration pipeline after all the started pipelines finish. The processed record contains the list map of started pipelines and the finished status of the pipelines.
In this case, a subsequent stage in the orchestration pipeline can depend on the completion of one of the started pipelines or a value from one of the started pipelines. For example, a Stream Selector processor might use the finished status to determine the tasks to complete next.
Configuring a Start Pipeline Processor
Configure a Start Pipeline processor to start a Data Collector, Data Collector Edge, or Transformer pipeline. The Start Pipeline processor is an orchestrator stage.