1

Using duck db, I am trying to write a data frame (from my VS code) into a parquet (in Azure storage accounts). I am getting the error as Not implemented Error: Writing to HTTP files not implemented.

However, while forming the data frame (which I am forming for the data in a csv file kept in the Azure storage accounts blob container), it is working well to read the csv file, in my VS code.

azure_storage_path= 'https://somename.blob.core.windows.net/the-conatiner-name'
table_name='https://somename.blob.core.windows.net/the-conatiner-name/the_csv.csv'

conn = duckdb.connect()
conn.execute('install httpfs') 
conn.execute('load httpfs')
            df = conn.execute("""
                              CREATE OR REPLACE TABLE some_table AS
                                SELECT *
                                FROM '"""+table_name+"""'
                                LIMIT 10
                             """).df()
##Error occurs in below line##
conn.execute("COPY (FROM some_table) TO '"+azure_storage_path+"/ParquetFile.parquet' (FORMAT 'parquet')")    

My target is to form the csv as a parquet in Azure container

Debottam
  • 13
  • 3

1 Answers1

1

The error is correct, HTTP doesn't really do a good job of abstracting filesystems (nor is it designed to).

Instead, you can use the fsspec support (which, full disclosure, I added)

import duckdb
from fsspec import filesystem

# this line will throw an exception if the appropriate filesystem interface is not installed
duckdb.register_filesystem(filesystem('abfs', account_name=ACCOUNT_NAME, account_key=ACCOUNT_KEY))

duckdb.execute("COPY (FROM some_table) TO 'abfs://the-container-name/ParquetFile.parquet' (FORMAT 'parquet')")
Mause
  • 461
  • 4
  • 9