I have a csv file with various columns of data that can be utilized for Python functions called by my PythonOperators. My dags pipeline is set up in such a way that I want to read the CSV for each row and feed those inputs into my operators. But how can I iterate my dag across the csv rows?
Asked
Active
Viewed 345 times
1 Answers
2
If you want to read a csv file, and process each row separately in a task, you can read the csv and use Dynamic Task Mapping (available since 2.3.0) to process the rows
with DAG(dag_id="dag id", start_date=...) as dag:
@task
def read_csv():
# here load the csv file and prepare the data to process
csv_file = ... # read csv_file
data_process = ... # a list of data calculated from the csv_file
return data_process # ex: [{"row":1, "x":1}, {"row":2, "x":1}, {"row":3, "x":2}]
@task
def processing(data_to_process):
# implement your processing function
print(f"row data: {data_to_process}")
data_to_process = read_csv()
processing.expand(data_to_process=data_to_process)

Hussein Awala
- 4,285
- 2
- 9
- 23