Install External Libraries
Install external libraries to make them available to Data Collector stages. When using multiple stage libraries for a particular stage, to make external libraries available to all versions of the stage, install the external libraries for each stage library.
You can install external libraries using the Package Manager in the Data Collector user interface, or you can install them manually.
-
Before you use the following stages, install JDBC drivers for the implementation that you want to use:
- JDBC Multitable Consumer origin
- JDBC Query Consumer origin
- MySQL Binary Log origin
- Oracle CDC Client origin
- PostgreSQL CDC Client origin
- SQL Server CDC Client origin
- SQL Server Change Tracking origin
- JDBC Lookup processor
- JDBC Tee processor
- PostgreSQL Metadata processor
- SQL Parser processor, when using the database to resolve the schema
- Google Bigtable destination
- JDBC Producer destination
- JDBC Query executor
For example, to use the JDBC Query Consumer or the JDBC Producer with Oracle, install the Oracle JDBC drivers.
- Before you use the Hadoop FS origin to read from non-HDFS systems, install all required file system application JAR files. See the file system documentation for details about the files to install.
- Before you use the Spark Evaluator processor, install the Spark application JAR file and any dependencies other than the streamsets-datacollector-api, streamsets-datacollector-spark-api, and spark-core libraries.
- You can install external Java libraries to call external Java code from the scripting processors: Groovy, Java, and Jython Evaluator.
- You can install the DataStax Enterprise (DSE) Java driver to configure the Cassandra destination to use DSE username and password authentication or Kerberos authentication.
- Before you use the Google Bigtable destination, install the BoringSSL library.
- Before you use the JMS Consumer origin or the JMS Producer destination, install the JMS drivers for the implementation that you are using.
- You can install the Impala JDBC driver under the stage library selected for the executor. For more information, see Installing the Impala Driver.
Install Using the Package Manager
To install external libraries using the Package Manager, complete the following general steps:
- Set up an external directory to store the libraries.
- Use the Package Manager within Data Collector to install the external libraries.
Step 1. Set Up an External Directory
Before you install external libraries, set up a local directory external to the Data Collector installation directory for the libraries. Use an external directory to enable use of the libraries after Data Collector upgrades. Use the required procedure for your installation type.
Setting Up for RPM and Tarball
Before you install external libraries for an RPM or tarball installation, set up an external directory to store the libraries.
- Create a local directory external to the Data Collector installation
directory.For example, if you installed Data Collector in the following directory:
you might create the external directory at:/opt/sdc/
/opt/sdc-extras
- Grant the user who starts Data Collector ownership on the external directory.For example, if you use the default system user and group named
sdc
to run Data Collector as a service, use the following command to change the owner of the external directory and all files in the directory tosdc:sdc
:chown -R sdc:sdc /opt/sdc-extras
- Add the STREAMSETS_LIBRARIES_EXTRA_DIR environment variable
to the appropriate file and point it to the external directory.
Modify environment variables using the method required by your installation type.
Set the environment variable as follows:
export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"
For example:
export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
- When using the Java Security Manager, which is enabled by default, update the Data Collector security policy
to include the external directory as follows:
- In the Data Collector
configuration directory, open the security policy file,
$SDC_CONF/sdc-security.policy
. - Add the following lines to the
file:
For example:// user-defined external directory grant codebase "file://<external directory>-" { permission java.security.AllPermission; };
// user-defined external directory grant codebase "file:///opt/sdc-extras/-" { permission java.security.AllPermission; };
- In the Data Collector
configuration directory, open the security policy file,
- Restart Data Collector.
Setting Up for Cloudera Manager
Before you install external libraries for a Cloudera Manager installation, set up an external directory to store the libraries.
- In Cloudera Manager, select the StreamSets service and then click Configuration.
- On the Configuration page, in the
Data Collector Advanced Configuration Snippet (Safety Valve) for
sdc-env.sh field, add the STREAMSETS_LIBRARIES_EXTRA_DIR environment
variable and point it to the external directory, as
follows:
export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"
For example:
By default, the path isexport STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
/var/lib/sdc
. - Create the
/opt/sdc-extras/
directory on every node that runs Data Collector. - Grant the user who starts Data Collector ownership on the external directory added to every node.For example, if you use the default system user and group named
sdc
to run Data Collector as a service, use the following command to change the owner of the external directory and all files in the directory tosdc:sdc
:chown -R sdc:sdc /opt/sdc-extras
- When using the Java Security Manager, which is enabled by default, update the
Data Collector Advanced Configuration Snippet (Safety Valve) for
sdc-security.policy property to include the external directory as follows:
// user-defined external directory grant codebase "file://<external directory>-" { permission java.security.AllPermission; };
For example:// user-defined external directory grant codebase "file:///opt/sdc-extras/-" { permission java.security.AllPermission; };
- Restart Data Collector.
Step 2. Install External Libraries
After you've set up the external directory, use the Package Manager within Data Collector to install external libraries.
Install Manually
To manually install external libraries, use the required procedure for your installation type.
Installing Manually for RPM and Tarball
To manually install external libraries for an RPM or tarball installation, perform the following steps:
- Create a local directory external to the Data Collector installation
directory.For example, if you installed Data Collector in the following directory:
you might create the external directory at:/opt/sdc/
/opt/sdc-extras
- Create subdirectories for each set of external libraries based on the stage library
name as
follows:
/opt/sdc-extras/<stage library name>/lib/
For example, to install drivers for stages included with the JDBC stage library, create the following subdirectory:/opt/sdc-extras/streamsets-datacollector-jdbc-lib/lib/
To also install drivers for stages included with the JMS stage library, create the following subdirectory:
/opt/sdc-extras/streamsets-datacollector-jms-lib/lib/
Note: If you use multiple stage libraries for a particular stage, and you want to use an external library with all stage libraries, you must install the external library for each stage library.For example, say you want to use an external library with the Spark Evaluator processor, but you use two versions of the processor - each from a different stage library. To make the external library available to both processor versions, you must upload the external library to both stage libraries.
- Copy the external libraries to the appropriate subdirectories.
- Add the STREAMSETS_LIBRARIES_EXTRA_DIR environment variable
to the appropriate file and point it to the external directory.
Modify environment variables using the method required by your installation type.
Set the environment variable as follows:
export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"
For example:
export STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
- When using the Java Security Manager, which is enabled by default, update the Data Collector security policy
to include the external directory as follows:
- In the Data Collector
configuration directory, open the security policy file,
$SDC_CONF/sdc-security.policy
. - Add the following lines to the
file:
For example:// user-defined external directory grant codebase "file://<external directory>-" { permission java.security.AllPermission; };
// user-defined external directory grant codebase "file:///opt/sdc-extras/-" { permission java.security.AllPermission; };
- In the Data Collector
configuration directory, open the security policy file,
- Restart Data Collector.
Installing Manually for Cloudera Manager
To manually install external libraries for an installation with Cloudera Manager, perform the following steps:
- In Cloudera Manager, select the StreamSets service and then click Configuration.
- On the Configuration page, in the
Data Collector Advanced Configuration Snippet (Safety Valve) for
sdc-env.sh field, add the STREAMSETS_LIBRARIES_EXTRA_DIR environment
variable and point it to the external directory, as
follows:
export STREAMSETS_LIBRARIES_EXTRA_DIR="<external directory>"
For example:
By default, the path isexport STREAMSETS_LIBRARIES_EXTRA_DIR="/opt/sdc-extras/"
/var/lib/sdc
. - On every node that runs Data Collector, create subdirectories for each set of
external libraries based on the stage library name as
follows:
$STREAMSETS_LIBRARIES_EXTRA_DIR/<stage library name>/lib/
For example, to install drivers for JDBC, create the following subdirectory on every node:/opt/sdc-extras/streamsets-datacollector-jdbc-lib/lib/
To also install drivers for JMS, create the following subdirectory on every node:/opt/sdc-extras/streamsets-datacollector-jms-lib/lib/
Note: If you use multiple stage libraries for a particular stage, and you want to use an external library with all stage libraries, you must install the external library for each stage library.For example, say you want to use an external library with the Spark Evaluator processor, but you use two versions of the processor - each from a different stage library. To make the external library available to both processor versions, you must upload the external library to both stage libraries.
- Copy the external libraries to the appropriate subdirectories on every node.
- When using the Java Security Manager, which is enabled by default, update the
Data Collector Advanced Configuration Snippet (Safety Valve) for
sdc-security.policy property to include the external directory as follows:
// user-defined external directory grant codebase "file://<external directory>-" { permission java.security.AllPermission; };
For example:// user-defined external directory grant codebase "file:///opt/sdc-extras/-" { permission java.security.AllPermission; };
- Restart Data Collector.