Extra Spark Configuration

When you create a pipeline, you can define extra Spark configuration properties that determine how the pipeline runs on Spark. Transformer passes the configuration properties to Spark when it launches the Spark application.

You can add any additional Spark configuration property, as described in the Spark configuration documentation.

You can also add the following extra configuration properties provided by Transformer. These are not Spark configuration properties:

Configuration Property	Description
spark.home	Overrides the SPARK_HOME environment variable set on the machine. For example, let's say that multiple Spark versions are installed locally on the Transformer machine. You can add the `spark.home` configuration property to run the pipeline on the Spark version that is not set in the environment variable.
dag.triggerDuration	For streaming execution mode, amount of time in milliseconds to wait between batches. Default is 1 millisecond.
transformer.ludicrous.mode	Runs the pipeline in ludicrous mode.
transformer.ludicrous.count.input	Generates and displays pipeline input statistics when running a pipeline in ludicrous mode.

Performance Tuning Properties

To tune the performance of a pipeline that runs on a cluster, you can add the following Spark configuration properties to override the default Spark values:

Spark Configuration Property	Description
spark.executor.instances	Number of Spark executors that the pipeline runs on. By default, Spark uses as many executors as required to run the pipeline. Use this configuration property when you need to limit executor usage in the cluster.
spark.executor.memory	Maximum amount of memory that each Spark executor uses to run the pipeline.
spark.executor.cores	Number of cores that each Spark executor uses to run the pipeline. Databricks does not allow overrides of this configuration property. Do not use this property when running the pipeline on a Databricks cluster.

For more information about these configuration properties, see the Spark Configuration documentation.