Installing StreamSets for Databricks on Amazon Web Services
You can install StreamSets for Databricks on Amazon Web Services (AWS). StreamSets for Databricks includes both StreamSets Data Collector and Transformer.
For more details about StreamSets for Databricks on AWS, see the AWS Marketplace listing.
-
In the AWS Marketplace, search for
StreamSets, and then subscribe to the
StreamSets for Databricks
offering. - Accept the terms and conditions, and then click Continue to Configuration.
- Select the appropriate AWS fulfillment options, and then click Continue to Launch.
-
To launch StreamSets for Databricks from the AWS marketplace website, choose Launch
from Website and then complete the following steps:
-
Select the recommended EC2 instance type or choose another instance
type based on your expected workload.
See the Data Collector installation requirements and the Transformer installation requirements for details.
- Select the appropriate VPC, subnet, and key pair settings.
-
For the security group settings, click Create New Based on
Seller Settings, enter a name for the new security
group, and then configure the range of IP addresses allowed for each
firewall rule.
Important: The default range of 0.0.0.0/0 gives all IP addresses access to Data Collector and Transformer. Be sure to modify the default values to restrict access to known IP addresses only.
Firewall Rule Description Rule for port 18630 Range of IP addresses that can access the Data Collector web-based UI on port 18630. Rule for port 19630 Range of IP addresses that can access the Transformer web-based UI on port 19630. - Click Launch.
-
Select the recommended EC2 instance type or choose another instance
type based on your expected workload.
-
To launch StreamSets for Databricks from the AWS EC2 console, choose Launch through
EC2 and then complete the following steps:
- Click Launch.
-
Select the recommended EC2 instance type or choose another instance
type based on your expected workload.
See the Data Collector installation requirements and the Transformer installation requirements for details.
- Configure the instance details, storage, and tags as needed.
-
When configuring the security group for the instance, configure the
range of IP addresses allowed for each firewall rule.
Important: The default range of 0.0.0.0/0 gives all IP addresses access to Data Collector and Transformer. Be sure to modify the default values to restrict access to known IP addresses only.
Firewall Rule Description Rule for port 18630 Range of IP addresses that can access the Data Collector web-based UI on port 18630. Rule for port 19630 Range of IP addresses that can access the Transformer web-based UI on port 19630. - After reviewing the details, click Launch.
-
When launching the instance, note the instance ID on the Launch
Status page.
The password to Data Collector and Transformer matches the instance ID.
AWS might require a few minutes to launch an instance.
-
To access Data Collector, enter the following URL in the address bar of your
browser:
http://<Public DNS of EC2 instance>:18630
For example if your DNS is
ec2-12-345-678-999.compute-1.amazonaws.com
, enter:http://ec2-12-345-678-999.compute-1.amazonaws.com:18630
-
To access Transformer, enter the following URL in the address bar of your browser:
http://<Public DNS of EC2 instance>:19630
For example if your DNS is
ec2-12-345-678-999.compute-1.amazonaws.com
, enter:http://ec2-12-345-678-999.compute-1.amazonaws.com:19630
-
To log in to either Data Collector or Transformer, enter admin as the user name and the EC2 instance ID
as the password.
Tip: If you are new to Data Collector, consider starting with the Databricks Delta Lake solutions. If you are new to Transformer, here are the basics.