Control Hub Configuration File

You can customize how a registered Data Collector works with StreamSets Control Hub by editing the Control Hub configuration file, $SDC_CONF/dpm.properties, located in the Data Collector installation.

Use a text editor to edit the dpm.properties configuration file. To enable the changes, restart Data Collector.

The Control Hub configuration file includes the following general properties:

General Property Description
dpm.enabled Specifies whether the Data Collector is enabled to work with Control Hub.

Default is false.

dpm.base.url URL to access Control Hub.

Set to the Control Hub URL provided by your system administrator. For example, https://<hostname>:18631.

dpm.registration.retry.attempts Maximum number of times that Data Collector attempts to register with Control Hub before failing the registration.

Default is 5.

dpm.security.validationTokenFrequency.secs Frequency in seconds that Data Collector validates authentication and user tokens with Control Hub.

Default is 60.

dpm.appAuthToken File located within $SDC_CONF, the Data Collector configuration directory, that includes the authentication token for this Data Collector instance.

Generally, you should not need to change this value.

dpm.remote.control.job.labels Labels to assign to this Data Collector. Use labels to group Data Collectors registered with Control Hub. To assign multiple labels, enter a comma-separated list of labels.

Default is "all", which you can use to run a job on all registered Data Collectors.

dpm.remote.control.ping.frequency Frequency in milliseconds that Data Collector notifies Control Hub that it is running.

Default is 5,000.

dpm.remote.control.events.recipient Name of the internal Control Hub application to which Data Collector sends pipeline status updates.

Do not change this value.

dpm.remote.control.process.events.recipients Names of the internal Control Hub applications to which Data Collector sends performance updates - including CPU load and memory usage.

Do not change this value.

dpm.remote.control.status.events.interval Frequency in milliseconds that Data Collector informs Control Hub of the following information:
  • Status of all local and published pipelines that are running on this Data Collector.
  • Performance information for this Data Collector - including CPU load and memory usage.

Default is 60,000.

dpm.remote.deployment.id For provisioned Data Collectors, the ID of the deployment that provisioned the Data Collector.

For manually administered Data Collectors, the value is blank.

Do not change this value.

http.meta.redirect.to.sso Enables the redirect of Data Collector user logins to Control Hub using the HTML meta refresh method. Set to true only if the registered Data Collector is installed as on application on Microsoft Azure HDInsight.

Default is false, which means that Data Collector uses HTTP redirect headers to redirect logins. Use the default for all other Data Collector installation types.

dpm.alias.name.enabled

Enables using an abbreviated Control Hub user ID when Hadoop impersonation mode or shell impersonation mode are used.

By default, when using Hadoop impersonation mode or shell impersonation mode, a Data Collector registered with Control Hub uses the full Control Hub user ID as the user name, as follows:
<ID>@<organization ID>

Enable this property to use only the ID, ignoring "@<organization ID>". For example, using myname instead of myname@org as the user name.

To use a partial Control Hub user ID, uncomment the property and set it to true.

When using Hadoop impersonation mode, the Hadoop system, Data Collector, and the pipeline stages must be properly configured. For more information, see Hadoop Impersonation Mode.

When using shell impersonation mode, Data Collector and the operating system to run the shell script must be properly configured. For more information, see Data Collector Shell Impersonation Mode.