2

Coming from airflow, I used jinja templates such as {{ds_nodash}} to translate the date of execution of a dag within my scripts.

For example, I am able to detect and ingest a file at the first of August 2022 if it is in the format : FILE_20220801.csv. I would have a dag with a sensor and an operator that uses FILE_{{ds_nodash}}.csv within its code. In other terms I was sure my dag was idempotent in regards to its execution date.

I am now looking into dagster because of the assets abstraction that is quite attractive. Also, dagster is easy to set-up and test locally. But I cannot find similar jinja templates that can ensure the idempotency of my executions.

In other words, how do I make sure data that was sent to me during a specific date is going to be processed the same way even if I run it 1, 2 or N days later?

Imad
  • 2,358
  • 5
  • 26
  • 55
  • https://docs.dagster.io/concepts/partitions-schedules-sensors/partitions seems to be close to what I want. I will post an answer once I test it out. – Imad Sep 06 '22 at 09:54

1 Answers1

3

If a file comes in every day (or hour, or week, etc.), and some of the assets that depend on the file have a partition for each file, then the recommended way to do this is with partitions. E.g.:

from dagster import DailyPartitionsDefinition, asset, sensor, repository, define_asset_job

daily_partitions_def = DailyPartitionsDefinition(start_date="2020-01-01", fmt=%Y%m%d)

@asset(partitions_def=daily_partitions_def)
def asset1(context):
    path = f"FILE_{context.partition_key}.csv"
    ...

@asset(partitions_def=daily_partitions_def)
def asset2(context):
    ...

def detect_file() -> Optional[str]:
    """Returns a value like '20220801', or None if no file is detected """

all_assets_job = define_asset_job("all_assets", partitions_def=daily_partitions_def)

@sensor(job=all_assets_job)
def my_sensor():
    date_str = detect_file()
    if date_str:
        return all_assets_job.run_request_for_partition(run_key=None, partition_key=date_str)
   

@repository
def repo():
    return [my_sensor, asset1, asset2]
Sandy Ryza
  • 265
  • 1
  • 8
  • Interesting ! I will definitely spin this up and test it first hand, but I think this will very probably be the answer to my question – Imad Sep 06 '22 at 16:13