Installing Transformer
The method that you use to install Transformer depends on the location of your Spark installation and where you choose to install Transformer.
You can install Transformer using
any of the following methods:
- Local Spark installation - To get started with Transformer in a development environment, you can simply install both Transformer and Spark on the same machine and run Spark locally on that machine. You develop and run Transformer pipelines on the single machine.
- Cluster Spark installation - In a production environment, use a Spark installation that runs on a cluster to leverage the performance and scale that Spark offers. Install Transformer on a machine that is configured to submit Spark jobs to the cluster. You develop Transformer pipelines locally on the machine where Transformer is installed. When you run Transformer pipelines, Spark distributes the processing across nodes in the cluster.
- Cloud Spark installation - When you have Spark installed in the cloud, you can use the AWS or Azure marketplace to install Transformer as a cloud service. When you run Transformer as a cloud service, you can easily run pipelines on cloud vendor Spark clusters such as Databricks, EMR, and Azure HDInsight.
After installing Transformer from a tarball, as a best practice, configure Transformer to use directories outside of the runtime directory.
If you use Docker, you can also run the Transformer image from Docker Hub.