0

There is not good documentation for dsl.ParallelFor but I assume the parallelism param means - how many pods can be opened in parallel. However, when I look at the DAG visualization, it seems like it opens all the for loop list of tasks, resulting in out of quota resources.

This step is in Pending state with this message: pods "pipeline-pdtbc-2302481418" is forbidden: exceeded quota: kf-resource-quota, requested: cpu=1500m, used: cpu=35100m, limited: cpu=36

Since my parallelism is set to 1, it should not have asked so much CPU, rather than running one by one.

enter image description here

Hanan Shteingart
  • 8,480
  • 10
  • 53
  • 66

2 Answers2

1

You need to set parallelism at pipeline level as well. e.g.: dsl.get_pipeline_conf().set_parallelism(10)

Anil
  • 11
  • 2
0

Apparently, it is a bug in Kubeflow: see https://github.com/kubeflow/pipelines/issues/6588.

Here is a hack fix:

def fix_parallelism(source_pipeline_path, parallelism = 10, target_pipeline_path = None):
    """
    limits the number of parallel tasks
    Args:
        source_pipeline_file(str) - path to the source pipeline yaml to be edited.
        parallelism(int) - parallelisim to use
        target_pipeline_path (str) - target path, default same as source.
    
    Returns:
        None - edits the source file.
    """
    # see https://github.com/kubeflow/pipelines/issues/6588

    with open(source_pipeline_path,'rt') as f:
        data = yaml.load(f, Loader=SafeLoader)
    pipeline_name = json.loads(data['metadata']['annotations']['pipelines.kubeflow.org/pipeline_spec'])['name']
    pipeline_index = [i for i, t in enumerate(data['spec']['templates']) if t['name']==pipeline_name][0]
    data['spec']['templates'][pipeline_index]['parallelism']=parallelism
    target_pipeline_path = target_pipeline_path or source_pipeline_path
    with open(target_pipeline_path,'wt') as f:
        yaml.dump(data, f)
Hanan Shteingart
  • 8,480
  • 10
  • 53
  • 66