You can install the full Data Collector
tarball and start it manually on all supported operating systems.
When you start Data Collector manually, Data Collector runs as the system user account logged into the command prompt when you run the
launch command. You can alternatively impersonate another user account when you run
the command.
-
Use the
following URL to download the full StreamSets Data Collector tarball from the StreamSets website: https://streamsets.com/opensource.
-
Extract the tarball to the desired location.
-
For a production environment, configure the directories used to store
configuration, data, log, and resource files so that they are outside of
$SDC_DIST, the location where you extracted the tarball and the base Data Collector runtime directory.
Use directories outside of the runtime directory to enable use of the
directories after Data Collector upgrades.
For a development or test environment, you can use the default locations
within the $SDC_DIST runtime directory. However, StreamSets recommends that
you use directories outside of the runtime directory for all environments.
If you use the default values for a development or test environment, make
sure the user who starts Data Collector has write permission on the base Data Collector runtime directory.
-
Create directories outside of the $SDC_DIST runtime directory for the
configuration, data, log, and resource files.
-
In the $SDC_DIST/libexec/sdc-env.sh file, set the
following environment variables to the newly created directories:
- SDC_CONF - The Data Collector configuration directory.
- SDC_DATA - The Data Collector directory for pipeline state and configuration
information.
- SDC_LOG - The Data Collector directory for logs.
- SDC_RESOURCES - The Data Collector directory for runtime resource files.
-
Copy all files from $SDC_DIST/etc to the newly
created $SDC_CONF directory.
-
Use the following command from the $SDC_DIST directory to run Data
Collector as the system user account logged into the command
prompt:
bin/streamsets dc
Or, use the following command to run Data Collector in the
background:
nohup bin/streamsets dc &
Use the following command to run Data Collector as another system user account:
sudo -u <user> bin/streamsets dc
-
To access the Data Collector UI, enter the following URL
in the address bar of your browser: