Extra Spark Configuration

When you create a pipeline, you can define extra Spark configuration properties that determine how the pipeline runs on Spark. Transformer passes the configuration properties to Spark when it launches the Spark application.

You can add any additional Spark configuration property, as described in the Spark configuration documentation.

You can also add the following extra configuration properties provided by Transformer. These are not Spark configuration properties:
Configuration Property Description
spark.home Overrides the SPARK_HOME environment variable set on the machine.

For example, let's say that multiple Spark versions are installed locally on the Transformer machine. You can add the spark.home configuration property to run the pipeline on the Spark version that is not set in the environment variable.

dag.triggerDuration

For streaming execution mode, amount of time in milliseconds to wait between batches.

Default is 1 millisecond.

transformer.ludicrous.mode Runs the pipeline in ludicrous mode.
transformer.ludicrous.count.input Generates and displays pipeline input statistics when running a pipeline in ludicrous mode.

Performance Tuning Properties

To tune the performance of a pipeline that runs on a cluster, you can add the following Spark configuration properties to override the default Spark values:
Spark Configuration Property Description
spark.executor.instances Number of Spark executors that the pipeline runs on.

By default, Spark uses as many executors as required to run the pipeline. Use this configuration property when you need to limit executor usage in the cluster.

spark.executor.memory Maximum amount of memory that each Spark executor uses to run the pipeline.
spark.executor.cores Number of cores that each Spark executor uses to run the pipeline.

Databricks does not allow overrides of this configuration property. Do not use this property when running the pipeline on a Databricks cluster.

For more information about these configuration properties, see the Spark Configuration documentation.