What you can do is changing the compute profile settings, which specifies how and where a pipeline is executed. For example, a profile includes the type of cloud provider, the service to use on the cloud provider (such as Dataproc), resources (memory and CPU), image, minimum and maximum node count, and other values.
Learn more about profiles on the CDAP documentation site.
One of the option is to create a new compute profile with a higher limit on worker memory or overriding worker memory for a run of the pipeline:
- Click on
System Admin
in the top right and then click on the Configuration
tab
- Click System Compute profiles
- Click on create new profile
- Choose Cloud Dataproc
- Leave the Project ID and Service account key blank
- Enter the required configuration of worker node
- Click on Save
Once the new compute profile is create attach the compute profile to the pipeline by clicking on configure in pipeline detail view and choosing the newly created compute profile and click on Save
.
Additionally, please check autoscaling option in DataFsuion.