Questions tagged [dagster]

Dagster is an open source system for building modern data applications.

Dagster, by Elementl, is a set of abstractions for building self-describing, testable, and reliable data applications. It uses functional data programming, gradual/optional typing, and testability to facilitate composition of data applications from DAGs of solids, its basic computational unit.

142 questions
2
votes
1 answer

Cross Validation using Dagster

I've started using Dagster in our ML pipeline, and am running into some basic issues that I'm wondering if I'm missing something trivial here or if this is just how it is... Say I have a simple ML pipepline: Load raw data --> Process data into table…
moomima
  • 1,200
  • 9
  • 12
2
votes
1 answer

NoneType Error when trying to make a custom BeautifulSoup Dagster Type

I've been messing around with @dagster_type and was trying to make a custom HtmlSoup type. Basically a fancy @dagster_type wrapper around a BeautifulSoup Object. import requests from bs4 import BeautifulSoup from dagster import ( dagster_type, …
JohnMav
  • 31
  • 4
2
votes
2 answers

Core compute for solid returned an output multiple times

I am very new to Dagster and I can't find answer to my question in the docs. I have 2 solids: one thats yielding tuples(str, str) that are parsed from XML file, he other one just consumes tuples and stores objects in DB with according fields set.…
2
votes
2 answers

No value for arguement in function call

I am very new to Python and am working through the Dagster hello tutorial I have set up the following from the tutorial import csv from dagster import execute_pipeline, execute_solid, pipeline, solid @solid def hello_cereal(context): #…
Kirsten
  • 15,730
  • 41
  • 179
  • 318
1
vote
1 answer

NameError: name 'topstory_ids' is not defined

I am new to Dagster and trying to follow the tutorial from official documentation, https://docs.dagster.io/tutorial/building-an-asset-graph After I copied and pasted this code, I got this error. NameError: name 'topstory_ids' is not defined How do I…
Steve
  • 2,963
  • 15
  • 61
  • 133
1
vote
0 answers

How to pass parameter to asset from op in dagster

I am new to dagster and having difficult in running the asset for different parameter values when job scheduled. I have created a pipeline using dagster. Trying to materialize the outcome of upstream asset multiple_num() and using op to pass…
Hari
  • 299
  • 4
  • 12
1
vote
1 answer

write test for dagster asset job

I am trying to write a simple test for a dagster job and I can't get it through... I am using dagster 1.3.6 So I have defined this job using the function dagster.define_asset_job from dagster import define_asset_job my_job:…
guillaume latour
  • 352
  • 2
  • 18
1
vote
0 answers

Docker AWS ECS integration on Mac NoCredentialProviders error when running "docker compose up"

When I am trying to run docker compose up to deploy my infrastructure to AWS using Docker's ECS integration. I'm getting the following error NoCredentialProviders: no valid providers in the chain. Deprecated. For verbose messaging see…
1
vote
1 answer

Dagster: handle TypedDict as op output

Context To keep track of the origin of my Dataframes, across my different ops, I defined TaggedDF as class TaggedDF(TypedDict): df: DataFrame ressource_name: str So I could @op(retry_policy=OPS_RETRY_POLICY) def read_fec(context, file_name:…
zar3bski
  • 2,773
  • 7
  • 25
  • 58
1
vote
1 answer

How do I add the materialization runtime to a software defined asset in Dagster?

I would like to keep track of how long it takes to materialize software defined assets over time (using Dagster). Ideally I'd add the "duration" to the materialization metadata. I could do this very crudely import time @asset def my_asset(): …
MYK
  • 1,988
  • 7
  • 30
1
vote
1 answer

Dagster -Execute an @Op only when all parallel executions are finished(DynamicOutput)

I have a problem that in fact I am not able to solve in dagster. I have the following configuration: I have step 1 where I get the data from an endpoint step 2 gets a list of customers dynamically: step 3 is the database update with the response…
1
vote
1 answer

Getting current execution date in a task or asset in dagster

Is there an easier way than what I'm doing to get the current date in an dagster asset, than what I'm currently doing? def current_dt(): return datetime.today().strftime('%Y-%m-%d') @asset def my_task(current_dt): return current_dt In…
pyCthon
  • 11,746
  • 20
  • 73
  • 135
1
vote
2 answers

How do I pass data to an op in a different module in dagster?

I am new to dagster and am having a difficult time sorting this one out. I have to jobs defined in my dagster pipeline and I want to pass data from an op in one job to an op in another My setup is as such (simplified example) job1.py @op() def…
1
vote
1 answer

Create success hook with telegram-bot alert

I'm new in Dagster and try to create success hook that will send alerts through a telegram bot. Need help, please Resource: @resource def send_message(message): class TelegramConnection: def telegram_resource(message): botid…
Andrey
  • 75
  • 10
1
vote
1 answer

How to create an EMR cluster and submit a spark-submit step using Dagster?

I want to create a Dagster app that creates an EMR cluster and adds a spark-submit step, but due to a lack of documentation or examples I can't figure out how to do that (copilot also struggles with it :-)). The idea is to create a scheduler with…
1 2
3
9 10