I am struggling to write a pyarrow table as parquet file to ADLS Gen2 storage container. I m working in Azure Synapse Analytics using notebook.
Here is what I am able to do:
- Mount ADLS Gen2 account to access files . Spark uses unique syntax to achieve this. Eg.
df = spark.read.load("synfs:/"+jobId+"/mnt/bronze/workday"+varFilepath
, format='csv',header=True)
print(type(df))
df.show()
This works fine. I then convert it to pandas dataframe to do some manipulation. Now I want to write this as a parquet file.
df_csv=df.toPandas()
pq_tbl=pa.Table.from_pandas(df_csv)
print(type(pq_tbl))
pq.write_table(pq_tbl,"workday/example.parquet",filesystem= "synfs:/"+jobId+"/mnt/bronze" )
I get an error :Unrecognized filesystem type in URI: synfs:/7/mnt/bronze