So let's say I have two solids. The first does some computations and writes a file to disk. The second solid takes that file and does other things with it, but it needs its filesystem path in order to open it. I can do this with two yield
s (one for the AssetMaterialization
and the other for the str
Output
) and explicitly putting the Output
in the second solid call:
from dagster import (AssetKey, AssetMaterialization, EventMetadataEntry,
Output, execute_pipeline, pipeline, solid)
@solid
def yield_asset(context):
yield AssetMaterialization(
asset_key=AssetKey('my_dataset'),
description='Persisted result to storage',
metadata_entries=[
EventMetadataEntry.text('Text-based metadata for this event',
label='text_metadata'),
EventMetadataEntry.fspath('/path/to/data/on/filesystem'),
EventMetadataEntry.url('http://mycoolsite.com/url_for_my_data',
label='dashboard_url'),
],
)
yield Output('/path/to/data/on/filesystem')
@solid
def print_asset_path(context, asset_path: str):
# do stuff with `asset_path`
context.log.info(asset_path)
@pipeline
def some_pipeline():
asset_path = yield_asset()
print_asset_path(asset_path)
if __name__ == "__main__":
result = execute_pipeline(some_pipeline)
This works fine, and you should get the info message in the logs (2021-03-16 13:23:29 - dagster - INFO - system - 366248ec-6a83-462f-b62f-9fb2514f6f80 - print_asset_path - /path/to/data/on/filesystem
) and the AssetMaterialization
in dagit
.
However, this is kind of inconvenient, since I need to explicitly yield an Output
with the filesystem path that I need. Is it possible, and how, to reference the AssetMaterialization
in the second solid, and use its properties directly?
Something like (won't work):
@solid
def print_asset_path(context):
asset_path = context.assets.get_asset_by_key(`my_key`).fspath
# do stuff with `asset_path`
context.log.info(asset_path)