0

When creating a PartitionSetDefinition in Dagster you can pass in a 'mode' that will swap the resources used (for testing purposes you may want to use cloud storage in PROD but use local storage for local development

A mode requires you to specify a set of config values that are usually provided in an environment yaml file but when you create a PartitionSetDefinition like below you can only pass the mode. This is usually done by setting a preset on the pipeline and using that for the run but PartitionSetDefinition only allows the setting of a mode not a preset.

date_partition_set = PartitionSetDefinition(
    name="date_partition_set",
    pipeline_name="my_pipeline",
    partition_fn=get_date_partitions,
    run_config_fn_for_partition=run_config_for_date_partition,
    mode="test"
)

How can you provide the necessary preset/environment values for this?

j-hulbert
  • 1
  • 1

1 Answers1

0

One way I've found to do this is to load the presets into the run config when the run config is created for each partition using some utilities that Dagster provides. Found this in some of their unit tests:

test_base.yaml has the typical preset configs corresponding to test mode.

from dagster.utils import file_relative_path, load_yaml_from_globs

def run_config_for_date_partition(partition):
    date = partition.value
    config_path = file_relative_path(__file__, os.path.join("../my_pkg/environments/", relative_path))
    config_dict = load_yaml_from_globs(
        config_path("test_base.yaml"),
    )
    table_name = "table1"
    input_config = {"config": {"start_date": date, "table_name": table_name}}
    config_dict["solids"] = {
        "download_snow_incremental_table": {**input_config}
    }
    return config_dict

date_partition_set = PartitionSetDefinition(
    name="date_partition_set",
    pipeline_name="my_pipeline",
    partition_fn=get_date_partitions,
    run_config_fn_for_partition=run_config_for_date_partition,
    mode="test"
)
j-hulbert
  • 1
  • 1