Oracle Bulkload
Supported pipeline types:
|
The Oracle Bulkload origin reads all available data from multiple Oracle tables, then stops the pipeline. The origin can use multiple threads to enable the parallel processing of data.
Use the Oracle Bulkload origin to quickly read static database tables, such as when you want to migrate tables to another database or system. Do not use the origin to read from tables that might change as the pipeline runs.
After migrating data from static tables, you can use a separate pipeline that includes the Oracle CDC Client origin to process CDC data from LogMiner redo logs or the JDBC Multitable Consumer origin to read data continuously from tables.
This origin is a Technology Preview feature. It is not meant for use in production.
When you configure the Oracle Bulkload origin, you specify connection information and the tables to read. You can also configure advanced properties, such as the number of threads to use, the number of batches to include in each transaction request, and the maximum batch size.
The origin can generate events for an event stream. For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Prerequisites
Install the Oracle Bulkload Stage Library
You must install the Oracle Bulkload stage library before using the Oracle Bulkload origin.
The Oracle Bulkload stage library is an Enterprise stage library that is free for development purposes only. For information about purchasing the stage library for use in production, contact StreamSets.
You can install the Enterprise stage library using Package Manager for a tarball Data Collector installation or as a custom stage library for a tarball, RPM, or Cloudera Manager Data Collector installation.
Supported Versions
Data Collector Version | Supported Stage Library Version |
---|---|
Data Collector 3.8.x | Oracle Enterprise Library 1.0.0 |
Installing with Package Manager
You can use Package Manager to install the Oracle Enterprise stage library on a tarball Data Collector installation.
Installing as a Custom Stage Library
You can install the Oracle Enterprise stage library as a custom stage library on a tarball Data Collector installation.
Installing the Oracle JDBC Driver
- Download the Oracle JDBC driver from the Oracle website.Note: Writing XML data to Oracle requires installing the Oracle Data Integrator Driver for XML. For more information, see the Oracle documentation.
- Install the driver as an external library for the Oracle Enterprise stage library.
For information about installing additional drivers, see Install External Libraries.
Batch Processing
Unlike most Data Collector origins, the Oracle Bulkload origin performs batch processing only. After processing all data, it stops the pipeline, rather than waiting for additional data as with streaming pipelines.
The Oracle Bulkload origin does not maintain an offset during processing. Each time that you run a pipeline that includes the Oracle Bulkload origin, the origin processes all available data in the specified tables, then stops the pipeline.
Schema and Table Names
When you configure the Oracle Bulkload origin, you specify the tables that you want to read. To specify the tables, you define the schema and a table name pattern.
You can use SQL wildcards to define a set of tables within a schema or across multiple schemas.
SALES_
. You can use the following configuration to specify the
tables to process: - Schema:
sales
- Table Name Pattern:
SALES_%
Multithreaded Processing
The Oracle Bulkload origin performs parallel processing and enables the creation of a multithreaded pipeline.
When you start the pipeline, the Oracle Bulkload origin retrieves the list of tables defined in the table configuration. The origin then uses multiple concurrent threads for processing based on the Maximum Pool Size property on the Advanced tab.
As the pipeline runs, Oracle creates blocks of data in memory. The Oracle Bulkload origin creates a task from a block of data and passes it to an available pipeline runner. The pipeline runner creates batches from the task for processing based on the maximum batch size configured for the origin.
A pipeline runner is a sourceless pipeline instance - an instance of the pipeline that includes all of the processors and destinations in the pipeline and performs all pipeline processing after the origin. Each pipeline runner processes one batch at a time, just like a pipeline that runs on a single thread.
When tasks created from Oracle blocks are smaller than desired, like when they are smaller than the maximum batch size, you can configure the origin to merge small tasks. Use the Minimum Task Size property on the Advanced tab to specify the minimum number of records to include in a task. When set, smaller tasks are merged to enable more efficient processing.
Multithreaded pipelines preserve the order of records within each batch, just like a single-threaded pipeline. But since tasks are processed by different pipeline runners, the order that batches are written to destinations is not ensured.
For more information about multithreaded pipelines, see Multithreaded Pipeline Overview.
Event Generation
The Oracle Bulkload origin can generate events that you can use in an event stream.
- With the Email executor to send a custom email
after receiving an event.
For an example, see Case Study: Sending Email.
- With a destination to store event information.
For an example, see Case Study: Event Storage.
For more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.
Event Record
Record Header Attribute | Description |
---|---|
sdc.event.type | Event type. Uses the following type:
|
sdc.event.version | Integer that indicates the version of the event record type. |
sdc.event.creation_timestamp | Epoch timestamp when the stage created the event. |
The Oracle Bulkload origin can generate the following event record:
- table-finished
- The Oracle Bulkload origin generates a table-finished event record when the origin completes processing all data within a table.
Configuring an Oracle Bulkload Origin
Configure an Oracle Bulkload origin to read data from one or more static database tables. This origin is a Technology Preview feature. It is not meant for use in production.
Before you use the origin in a pipeline, complete the prerequisite tasks.