JDBC Lookup
The JDBC Lookup processor performs a lookup on a database table. The processor can return the first matching row, all matching rows, a count of matching rows, or a boolean value that indicates whether a match was found.
When you configure the JDBC Lookup processor, you specify database connection information, the name of the lookup table, and any additional JDBC configuration properties that you want to use. You can optionally configure advanced properties related to the JDBC driver.
You can also use a connection to configure the processor.
You configure the record field to use and the table column to match against. You also specify the operator to use. You select the information to return, then configure related properties.
When returning one or more records, you specify the columns to return and optionally define a prefix for the resulting field names to prevent adding duplicate fields to the record. You can specify columns to sort by and the sort order. When returning multiple rows, you can specify a maximum number of rows to return.
When returning a count or boolean value, you define a name for the field to contain the results. If the field does not exist, the processor creates it.
If the lookup table is static, you can configure the processor to load the table only once, enabling the processor to cache and reuse the data for the duration of the pipeline run.
If not loading only once, and if the processor passes data to multiple stages, you might enable caching to improve pipeline performance.
Before using the JDBC Lookup processor, verify if you need to install a JDBC driver.
Database Vendors and Drivers
The JDBC Lookup processor can return data from multiple database vendors.
| Database Vendor | Versions and Drivers | 
|---|---|
| Microsoft SQL Server | SQL Server 2017 with the SQL Server JDBC 8.4.0 JRE8 driver | 
| MySQL | MySQL 5.7 with the MySQL Connector/J 8.0.12 driver | 
| Oracle | Oracle 11g with the Oracle 19.3.0 JDBC driver | 
| PostgreSQL | PostgreSQL 9.6.2 with the PostgreSQL 42.2.5 driver | 
Installing the JDBC Driver
- Microsoft SQL Server
- PostgreSQL
When using the stage to connect to any other database, you must install the JDBC driver for the database. Install the driver as an external library for the JDBC stage library.
By default, Transformer bundles a JDBC driver into the launched Spark application so that the driver is available on each node in the cluster. If you prefer to manually install an appropriate JDBC driver on each Spark node, you can configure the stage to skip bundling the driver on the Advanced tab of the stage properties.
- Apache Derby
- IBM DB2
- Microsoft SQL Server
- MySQL
- Oracle
- PostgreSQL
- Teradata
If you install a custom JDBC driver or a driver provided by another vendor, you must specify the JDBC driver class name on the Advanced tab of the stage.
Configuring a JDBC Lookup Processor
Configure a JDBC Lookup processor to perform lookups on a database table. Before using the processor, verify if you need to install a JDBC driver.