I am trying to use DuckDB with the HTTPFS extension to query around 1000 parquet files with the same schema from an s3 bucket with a similar key.
When I query a single file with duckdb I'm able to get the table
import duckdb
import pandas as pd
cursor = duckdb.connect()
df = cursor.execute(f"""
INSTALL httpfs;
LOAD httpfs;
SET s3_region='{s3_region}';
SET s3_access_key_id='{access_key_id}';
SET s3_secret_access_key='{secret_access_key}';
SELECT *
FROM parquet_scan(['s3:://bucket/folder/fname.parquet'],
FILENAME = 1);
""").df()
However, when I use file globbing, as explained by the docs (https://duckdb.org/docs/extensions/httpfs), I get a duckdb.Error: Invalid Error: HTTP GET error
, which is a HTTP 403 (Access Denied).
SELECT *
FROM parquet_scan(['s3:://bucket/folder/*.parquet'],
FILENAME = 1);
I thought this was just an AWS IAM permissions issue, but I've given list and read access to the entire bucket, so as far as I know, it isn't that.
What is causing this error?