Enabling HTTPS
- Data Collector
- Enable HTTPS for Data Collector to secure the communication to the Data Collector UI and REST API and to use the Data Collector as an authoring Data Collector in Control Hub.
- Cluster pipelines
- If you run cluster pipelines, enable HTTPS for cluster pipelines to secure the communication between the gateway and worker nodes in the cluster.
- Pipeline stages that connect to external systems
- During pipeline development, developers can enable specific stages to use SSL/TLS to secure the communication with an external system. For example, if designing a pipeline that writes to a Cassandra cluster enabled for HTTPS, the developer must configure the Cassandra destination to use SSL/TLS to connect to Cassandra.
By default, Data Collector and cluster pipelines use the HTTP protocol. StreamSets recommends using HTTPS in a production environment. HTTPS requires SSL/TLS certificates.
Prerequisites
Before you enable HTTPS for Data Collector and cluster pipelines, complete the following requirements:
- Obtain access to OpenSSL and Java keytool
- If you do not have keystore files that include SSL/TLS certificates signed by a
certificate authority (CA), you can request certificates and create the keystore
files using the following tools:
- OpenSSL - Use OpenSSL to create a Certificate Signing Request (CSR) that you send to the CA of your choice, as well as to create the keystore and truststore files. For more information, see the OpenSSL documentation.
- Java keytool - You can also use Java keytool to create a CSR and to create keystore and truststore files. Java keytool is part of the Java Development Kit (JDK). For more information, see the keytool documentation.
- Generate SSL/TLS certificate and private key pairs signed by a certificate authority (CA)
- To enable HTTPS for Data Collector, generate a single private key and public certificate pair for Data Collector. Data Collector provides a self-signed certificate that you can use. However, web browsers generally issue a warning for self-signed certificates. StreamSets strongly recommends that you generate a key and certificate pair signed by a CA.
Step 1. Create Keystore Files
Create a keystore file that includes each private key and public certificate pair signed by the CA. A keystore is used to verify the identity of the client upon a request from an SSL/TLS server.
To enable HTTPS for Data Collector, create a single keystore file for Data Collector.
To enable HTTPS for cluster pipelines, each worker node requires a keystore file. If you generated a unique certificate for each worker node, create a unique keystore file for each of those certificates. Or if you generated a SAN certificate valid for all of the worker nodes, create a single keystore file that all the worker nodes can use. Data Collector runs on the gateway node in the cluster, so the gateway node uses the same keystore file that you create for Data Collector.
StreamSets recommends creating keystores in the PKCS #12 (p12 file) format. In most cases, a CA issues certificates in PEM format. Use OpenSSL to directly import the certificate into a PKCS #12 keystore.
Step 2. Create a Truststore File
A truststore file contains certificates from trusted CAs that an SSL/TLS client uses to verify the identity of an SSL/TLS server.
- Secure LDAP server when Data Collector is configured for secure LDAP authentication.
- Control Hub on-premises installation enabled for HTTPS when Data Collector is registered with Control Hub on-premises.
- Worker node when Data Collector runs cluster pipelines enabled for HTTPS.
If you've enabled HTTPS for cluster pipelines, worker nodes require a truststore file to verify the identity of the gateway node where Data Collector is installed.
By default, Data Collector and worker nodes use the default Java truststore file located in $JAVA_HOME/jre/lib/security/cacerts. If your certificates are signed by a trusted CA that is included in the default Java truststore file, you do not need to create a truststore file for Data Collector or worker nodes and can skip this step.
If your certificates are signed by a private CA or not trusted by the default Java truststore, you must create a custom truststore file or modify a copy of the default Java truststore file to add the root and intermediate CA certificates to the Data Collector and worker node truststore file. For example, if your organization generates its own certificates, you must add the root and intermediate certificates for your organization to the truststore file.
You can create a single truststore file used by both Data Collector and worker nodes. Or you can create separate truststore files.
In these steps, we show how to modify a copy of the default truststore file to add an additional CA to the list of trusted CAs. We assume that the same CA signed our certificates used by Data Collector and by each worker node in the cluster. If multiple CAs signed your certificates, you'll need to add each CA to the truststore file.
If you prefer to create a custom truststore file, see the keytool documentation.
- Java keystore file (JKS)
- PKCS #12 (p12 file)
Step 3. Configure Data Collector to Use HTTPS
Modify Data Collector configuration properties to configure Data Collector to use a secure port and your keystore file. If you created a custom truststore file or modified a copy of the default Java truststore file, configure Data Collector to use that truststore file.
Step 4. Configure Cluster Pipelines to Use HTTPS
To enable HTTPS for cluster pipelines, configure the gateway and worker nodes in the cluster to use HTTPS. If you do not run cluster pipelines, you can skip this step.
Modify the Data Collector configuration file, sdc.properties, on the gateway node to configure the worker nodes to use the keystore file stored on each worker node. If you created a custom truststore file or modified a copy of the default Java truststore file, configure the gateway and worker nodes to use the truststore file.