Administration
Viewing Data Collector Configuration Properties
For details about the configuration properties or to edit the configuration file, see Configuring Data Collector.
Viewing Data Collector Directories
You can view the directories that the Data Collector uses. You might check the directories being used to access a file in the directory or to increase the amount of available space for a directory.
Data Collector directories are defined in environment variables. For more information, see Data Collector Environment Configuration.
To view Data Collector directories, click .
| Directory | Includes | Environment Variable |
|---|---|---|
| Runtime | Base directory for Data Collector executables and related files. | SDC_DIST |
| Configuration | The Data Collector
configuration file, sdc.properties, and related realm properties files and
keystore files. Also includes the logj4 properties file. |
SDC_CONF |
| Data | Pipeline configuration and run details. | SDC_DATA |
| Log | Data Collector
log file, sdc.log. |
SDC_LOG |
| Resources | Directory for runtime resource files. | SDC_RESOURCES |
Viewing Data Collector Metrics
You can view metrics about Data Collector, such as the CPU usage or the number of pipeline runners in the thread pool.
Viewing Data Collector Logs
You can view and download log data. When you download log data, you can select the file to download.
Modifying the Log Level
If the Data Collector logs do not provide enough troubleshooting information, you can modify the log level to display messages at another severity level.
- TRACE
- DEBUG
- INFO (Default)
- WARN
- ERROR
- FATAL
When you’ve finished troubleshooting, set the log level back to INFO to avoid having verbose log files.
Shutting Down Data Collector
-
For CentOS 6, Red Hat Enterprise Linux 6, or Ubuntu 14.04 LTS, use:
service sdc stop -
For CentOS 7, Red Hat Enterprise Linux 7, or Ubuntu 16.04 LTS, use:
systemctl stop sdc
kill -15 <process ID>To use the Data Collector UI for shutdown:
- Click .
- When a confirmation dialog box appears, click Yes.
Restarting Data Collector
docker
restart.Viewing Users and Groups
If you use file-based authentication, you can view all user accounts granted access to this Data Collector instance, including the roles and groups assigned to each user.
To view users and groups, click . Data Collector displays a read-only view of the users, groups, and roles.
You configure users, groups, and roles for file-based authentication in the associated
realm.properties file located in the Data Collector
configuration directory, $SDC_CONF. For more information, see Configuring File-Based Authentication.
Support Bundles
You can use Data Collector to generate a support bundle. A support bundle is a ZIP file that includes Data Collector logs, environment and configuration information, pipeline JSON files, resource files, and pipeline snapshots. You upload the generated file to the StreamSets support team so that we can use the information to troubleshoot your support tickets. Or, you can download the generated file and then send the file to another StreamSets community member.
Data Collector uses several generators to create a support bundle. Each generator bundles different types of information. You can choose to use all or some of the generators.
Each generator automatically redacts all passwords entered in pipelines, configuration files, or resource files. The generators replace all passwords with the text "REDACTED" in the generated files. You can customize the generators to redact other sensitive information, such as machine names or usernames.
Before uploading a generated ZIP file to support, we recommend verifying that the file does not include any other sensitive information that you do not want to share.
Including Customer IDs
When you have a paid subscription for StreamSets, configure Data Collector to include your StreamSets customer ID in support bundles before submitting a bundle. This enables the StreamSets support team to easily locate and prioritize your bundles.
As a paid subscriber, you can contact the
StreamSets support team for your customer ID. The support team sends a customer ID file
named customer.id.
The steps for including the customer ID in support bundles differs based on the Data Collector installation method.
Cloudera Manager Installation
Data Collector version 3.4.0 or later provides a customer.id property that displays in Cloudera Manager and enables including your customer ID in support bundles. Earlier versions of Data Collector require performing alternate steps.
Data Collector 3.4.0 or Later
- Contact the StreamSets support team for your customer ID.
The support team generates and sends a customer ID file named
customer.id. - Copy the customer ID enclosed in the file.
- In Cloudera Manager, paste your customer ID in the customer.id property for the StreamSets service.
Data Collector 3.3.x or Earlier
- Contact the StreamSets support team for your customer ID.
The support team generates and sends a customer ID file named
customer.id. - In Cloudera Manager, verify the directory specified in the
SDC_DATAproperty for the StreamSets service. - Place the customer ID file in the $SDC_DATA directory on every node that runs Data Collector.
Other Installation Types
- Contact the StreamSets support team for your customer ID.
The support team generates and sends a customer ID file named
customer.id. - Place the customer ID file in the $SDC_DATA directory for
each Data Collector.Tip: To easily verify the location of the $SDC_DATA directory, click and check the Data Directory property.
Generators
Data Collector can use the following generators to create a support bundle:
| Generator | Description |
|---|---|
| SDC Info | Includes the following information:
|
| Pipelines | Includes the following JSON files for each pipeline:
By default, all Data Collector pipelines are included in the bundle. |
| Logs | Includes the most recent content of the following log files:
|
| Snapshots | Includes snapshots created for each pipeline. |
In addition, Data Collector always generates the following files when you create a support bundle:
metadata.properties- ID and version of the Data Collector that generated the bundle.generators.properties- List of generators used for the bundle.
Generating a Support Bundle
When you generate a support bundle, you choose the information to include in the bundle. Only users with the admin role can generate support bundles.
bundle.upload.enabled property in the Data Collector configuration file, $SDC_CONF/sdc.properties.
For more information, see Configuring Data Collector.- Click the Help icon, and then click Support Bundle.
- Select the generators that you want to use.
-
Click one of the following options:
- Download - Generates the support bundle and saves the ZIP file to your
default downloads directory.
Use to verify that the ZIP file does not include other sensitive information that you do not want to share. For example, you might want to remove the pipelines not associated with your support ticket. By default, all Data Collector pipelines are included in the bundle. If you modify the ZIP file in any way, you must manually upload the file to StreamSets support.
- Upload - Generates the support bundle and automatically uploads the ZIP file to the StreamSets support team.
- Download - Generates the support bundle and saves the ZIP file to your
default downloads directory.
Customizing Generators
By default, the generators redact all passwords entered in pipelines, configuration files, or resource files. You can customize the generators to redact other sensitive information, such as machine names or usernames.
To customize the generators, modify the support bundle redactor file, $SDC_CONF/support-bundle-redactor.json. The file contains rules that the generators use to redact sensitive information. Each rule contains the following information:
- description - Description of the rule.
- trigger - String constant that triggers a redaction. If a line contains this trigger string, then the redaction continues by applying the regular expression specified in the search property.
- search - Regular expression that defines the sub-string to redact.
- replace - String to replace the redacted information with.
{
"description": "Custom domain names",
"trigger": ".streamsets.com",
"search": "[a-z_-]+.streamsets.com",
"replace": "REDACTED.streamsets.com"
}REST Response
You can view REST response JSON data for different aspects of the Data Collector, such as pipeline configuration information or monitoring details.
You can use the REST response information to provide Data Collector details to a REST-based monitoring system. Or you might use the information in conjunction with the Data Collector REST API.
- Pipeline Configuration - Provides information about the pipeline and each stage in the pipeline.
- Pipeline Rules - Provides information about metric and data rules and alerts.
- Definitions - Provides information about all available Data Collector stages.
- Preview Data- Provides information about the preview data moving through the pipeline. Also includes monitoring information that is not used in preview.
- Pipeline Monitoring - Provides monitoring information for the pipeline.
- Pipeline Status - Provides the current status of the pipeline.
- Data Collector Metrics - Provides metrics about Data Collector.
- Thread Dump - Lists all active Java threads used by Data Collector.
Viewing REST Response Data
You can view REST response data from the location where the relevant information displays. For example, you can view Data Collector Metrics REST response data from the Data Collector Metrics page.
- Edit mode
- From the Properties panel, you can use the
More icon (
) to view the following REST response data:- Pipeline Configuration
- Pipeline Rules
- Pipeline Status
- Definitions
- Preview mode
- From the Preview panel, you can use the More icon to view the Preview Data REST response data.
- Monitor mode
- From the Monitor panel, you can use the
More icon to view the following REST response
data:
- Pipeline Monitoring
- Pipeline Configuration
- Pipeline Rules
- Pipeline Status
- Definitions
- Data Collector Metrics page
- From the Data Collector Metrics page, , you can use the More icon to
view the following REST response data:
- Data Collector Metrics
- Thread Dump
Disabling the REST Response Menu
You can configure the Data Collector to disable the display of REST responses.
- To disable the REST Response menus, click the Help icon, and then click Settings.
- In the Settings window, select Hide the REST Response Menu.
Command Line Interface
Data Collector
provides a command line interface that includes a basic cli command. Use the
command to perform some of the same actions that you can complete from the Data Collector UI. Data Collector must be
running before you can use the cli command.
cli command:- help
- Provides information about each command or subcommand.
- manager
- Provides the following subcommands:
- start - Starts a pipeline.
- status - Returns the status of a pipeline.
- stop - Stops a pipeline.
- reset-origin - Resets the origin when possible.
- get-committed-offsets - Returns the last-saved offset for pipeline failover.
- update-committed-offsets - Updates the last-saved offset for pipeline failover.
- store
- Provides the following subcommands:
- import - Imports a pipeline.
- list - Lists information for all available pipelines.
- system
- Provides the following subcommands:
- enableDPM - Register the Data Collector with StreamSets Control Hub.
- disableDPM - Unregister the Data Collector from Control Hub.
Java Configuration Options for the Cli Command
Use the SDC_CLI_JAVA_OPTS environment variable to modify Java configuration options for
the cli command.
For
example, to set the -Djavax.net.ssl.trustStore option for the
cli command when using Data Collector
with HTTPS, run the following command:
export SDC_CLI_JAVA_OPTS="-Djavax.net.ssl.trustStore=<path to truststore file> ${SDC_CLI_JAVA_OPTS}"
Using the Cli Command
Call the
cli command from the $SDC_DISTdirectory.
cli
commands:bin/streamsets cli \
(-U <sdcURL> | --url <sdcURL>) \
[(-a <sdcAuthType> | --auth-type <sdcAuthType>)] \
[(-u <sdcUser> | --user <sdcUser>)] \
[(-p <sdcPassword> | --password <sdcPassword>)] \
[(-D <dpmURL> | --dpmURL <dpmURL>)] \
<command> <subcommand> [<args>] The usage of the basic command options depends on whether or not the Data Collector is registered with Control Hub.
Not Registered with Control Hub
| Option | Description |
|---|---|
| -U <sdcURL> or --url <sdcURL> |
Required. URL of the Data Collector. The default URL is
|
| -a <sdcAuthType> or --auth-type <sdcAuthType> |
Optional. HTTP authentication type used by the Data Collector. |
| -u <sdcUser> or --user <sdcUser> |
Optional. User name to use to log in. The roles assigned to the
user account determine the tasks that you can perform. If you omit this option, the Data Collector allows admin access. |
| -p <sdcPassword> or --password <sdcPassword> |
Optional. Required when you enter a user name. Password for the user account. |
| -D <dpmURL> or --dpmURL <dpmURL> |
Not applicable. Do not use when the Data Collector is not registered with Control Hub. |
| <command> | Required. Command to perform. |
| <subcommand> | Required for all commands except help. Subcommand to perform. |
| <args> | Optional. Include arguments and options as needed. |
Registered with Control Hub
| Option | Description |
|---|---|
| -U <sdcURL> or --url <sdcURL> |
Required. URL of the Data Collector. The default URL is
|
| -a <sdcAuthType> or --auth-type <sdcAuthType> |
Required. Authentication type used by the Data Collector. Set to dpm. If you omit this option, Data Collector uses the Form authentication type, which causes the command to fail. |
| -u <sdcUser> or --user <sdcUser> |
Required. User account to log in. Enter your Control Hub user ID using the following format:
The roles assigned to the Control Hub user account determine the tasks that you can perform. If you omit this option, Data Collector uses the admin user account, which causes the command to fail. |
| -p <sdcPassword> or --password <sdcPassword> |
Required. Enter the password for your Control Hub user account. |
| -D <dpmURL> or --dpmURL <dpmURL> |
Required. Set to: https://cloud.streamsets.com. |
| <command> | Required. Command to perform. |
| <subcommand> | Required for all commands except help. Subcommand to perform. |
| <args> | Optional. Include arguments and options as needed. |
Help Command
Use the help command to view additional information for the specified command.
bin/streamsets cli \
(-U <sdcURL> | --url <sdcURL>) \
[(-a <sdcAuthType> | --auth-type <sdcAuthType>)] \
[(-u <sdcUser> | --user <sdcUser>)] \
[(-p <sdcPassword> | --password <sdcPassword>)] \
[(-D <dpmURL> | --dpmURL <dpmURL>)] \
help <command> [<subcommand>]bin/streamsets cli -U http://localhost:18630 help managerManager Command
The manager command provides subcommands to start and stop a pipeline,
view the status of all pipelines, and reset the origin for a pipeline. It can also be used
to get the last-saved offset and to update the last-saved offset for a pipeline.
manager command returns the pipeline status object after it
successfully completes the specified subcommand. The following is a sample of the
pipeline status object:
{
"user" : "admin",
"name" : "MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db",
"pipelineID" : "MyPipelinejf45e1f1-dfc1-402c-8587-918bc6e831db",
"rev" : "0",
"status" : "STOPPING",
"message" : null,
"timeStamp" : 1447116703147,
"attributes" : { },
"executionMode" : "STANDALONE",
"metrics" : null,
"retryAttempt" : 0,
"nextRetryTimeStamp" : 0
}Note that the timestamp is in the Long data format.
You can use the following manager subcommands:
- start
- Starts a pipeline. Returns the pipeline status when successful.
- stop
- Stops a pipeline. Returns the pipeline status when successful.
- status
- Returns the status of a pipeline. Returns the pipeline status when successful.
- reset-origin
- Resets the origin of a pipeline. Use for pipeline origins that can be reset. Some pipeline origins cannot be reset. Returns the pipeline status when successful.
- get-committed-offsets
- Returns the last-saved offset for a pipeline with an origin that saves offsets. Some origins, such as the HTTP Server, have no need to save offsets.
- update-committed-offsets
- Updates the last-saved offset for a pipeline with an origin that saves offsets. Some origins, such as the HTTP Server, have no need to save offsets.
Store Command
The store command provides subcommands to view a list of all pipelines
and to import a pipeline.
store command:- list
- Lists all available pipelines. The
listsubcommand uses the following syntax:store list - import
- Imports a pipeline. Use to import a pipeline JSON file, typically exported from a Data Collector. Returns a message when the import is successful.
System Command
The system command provides subcommands to register and unregister the
Data Collector
with Control Hub.
You can use
the following subcommands with the system command:
- enableDPM
- Registers the Data Collector with Control Hub. For a description of the syntax, see Registering from the Command Line Interface.
- disableDPM
- Unregisters the Data Collector with Control Hub. For a description of the syntax, see Unregistering from the Command Line Interface.