Extra Spark Configuration
When you create a pipeline, you can define extra Spark configuration properties that determine how the pipeline runs on Spark. Transformer passes the configuration properties to Spark when it launches the Spark application.
You can add any additional Spark configuration property, as described in the Spark configuration documentation.
| Configuration Property | Description |
|---|---|
| spark.home | Overrides the SPARK_HOME environment variable set on the
machine. For example, let's say that multiple Spark versions
are installed locally on the Transformer machine. You can add the |
|
dag.triggerDuration |
For streaming execution mode, amount of time in milliseconds to wait between batches. Default is 1 millisecond. |
| transformer.ludicrous.mode | Runs the pipeline in ludicrous mode. |
| transformer.ludicrous.count.input | Generates and displays pipeline input statistics when running a pipeline in ludicrous mode. |
Performance Tuning Properties
| Spark Configuration Property | Description |
|---|---|
| spark.executor.instances | Number of Spark executors that the pipeline runs on. By default, Spark uses as many executors as required to run the pipeline. Use this configuration property when you need to limit executor usage in the cluster. |
| spark.executor.memory | Maximum amount of memory that each Spark executor uses to run the pipeline. |
| spark.executor.cores | Number of cores that each Spark executor uses to run the
pipeline. Databricks does not allow overrides of this configuration property. Do not use this property when running the pipeline on a Databricks cluster. |
For more information about these configuration properties, see the Spark Configuration documentation.