Pipelines
Use the
following tips for help with pipeline errors:
- A pipeline fails to start with the following error:
-
org.apache.spark.SparkException: Dynamic allocation of executors requires the external shuffle service. You may enable this through spark.shuffle.service.enabled.
- A pipeline fails to start with the following error:
This error occurs when you try to run a pipeline on an existing Databricks cluster that has previously run pipelines built on a different version of Transformer. This is not allowed.TRANSFORMER_03 Databricks shared cluster <name> already contains StreamSets libraries from a different staging directory <directory>. Either replace 'Pipeline Config > Staging Directory' config value to <directory> or uninstall all libraries from the shared cluster and restart the cluster.
- A pipeline preview, validation, or run fails with the following error:
This error occurs when Spark cannot communicate with Transformer using the properties configured in the Transformer configuration properties.TRANSFORMER_02 Failed to <preview, validate, or run> pipeline, check logs for error. The Transformer Spark application in the Spark cluster might not be able to reach Transformer at <URL>. If this is not the correct URL, update the transformer.base.http.url property in the Transformer configuration file or define a cluster callback URL for the pipeline and restart Transformer.
- A pipeline fails with the following run error:
-
org.apache.spark.sql.AnalysisException: Found duplicate column(s)
- Pipeline validation fails with the following stage library/cluster manager mismatch error:
-
VALIDATION_0300 Stage <stage name> using the <stage library name> stage library cannot be used with the <cluster type> cluster manager type