I was trying to write a txt file to the dags folder in a cloud composer DAG. The file was never showing up and I thought there was something wrong with my code, but I tried saving a pandas dataframe in xlsx format do the DAGs folder and load that dataframe.
It turns out it worked. In the same code I was able to write the pandas dataframe and then read it in the same DAG run, but when I looked in the folder afterwards there was no file. If I run the code again and try to read it, it says the file doesn't exists.
It's like the file gets written only temporarily.
I'm also using the folder's full path ("/home/airflow/gcs/dags"), and since I'm trying to save the file to the dags folder my composer belongs to, I thought I shouldn't be facing this much trouble.
Anyone has any thoughts on how I can solve this?
EDIT:
Snippet of code:
def _crawl_spiders():
# sets working dir
os.chdir('/home/airflow/gcs/dags/mypath')
df = pd.read_excel('./x-path/sheet.xlsx')
df.to_excel('/home/airflow/gcs/dags/mypath/test.xlsx', index = False)
b = pd.read_excel('/home/airflow/gcs/dags/mypath/test.xlsx')
print(f'Success, b columns:{b.columns}')
with DAG(dag_id="crawler", start_date=datetime(2022,7,28),
schedule_interval='@daily', tags=['muffet', 'crawler']) as dag:
crawl_spiders = PythonOperator(
task_id = 'crawl_spiders',
python_callable = _crawl_spiders,
dag = dag)```