Java Heap Size
Increase or decrease the Data Collector Java heap size as necessary, based on the resources available on the host machine. By default, the Java heap size is 1024 MB.
The Java heap size determines the heap size allocated to Data Collector and affects the amount of memory Data Collector can use when it runs a pipeline. Running a pipeline can use up to 65% of the allocated heap size.
Use the following Java options to define the Java heap size:
- Xmx - Defines the maximum heap size.
- Xms - Defines the minimum heap size.
Tip: To avoid constant recalculation of the allocated heap size, set both
properties to the same value. To define the unit of measure, use m for MB and g for GB.
Define the heap size based on your installation:
- Tarball or RPM installation
-
Define the heap size in the SDC_JAVA_OPTS environment variable.
For example, to double the heap size, increase the Xmx and Xms settings as follows:
export SDC_JAVA_OPTS="${SDC_JAVA_OPTS} -Xmx2048m -Xms2048m -server"
using the method required by your installation type.
- Cloudera Manager installation
- Define the heap size in the Data Collector Advanced Configuration Snippet (Safety Valve) for sdc-env.sh field for the StreamSets service in Cloudera Manager.
With a heap size of 2048 MB, you can configure a pipeline to use up to 65% - that's 1331 MB
of memory.
Note: In the pipeline properties, you can use the jvm:maxMemoryMB() function to help
define the percentage of the heap size the pipeline uses.