I'm looking for a way to load data from an Azure DataLake Gen2 using Dask, the content of the container are only parquet files but I only have the account name, account endpoint and an SAS token.
When I use Azure SDK for a File System, I can navigate easily with only those values.
azure_file_system_client = FileSystemClient(
account_url=endpoint,
file_system_name="container-name",
credential=sas_token,
)
When I try to do the same using abfs in DASK using the adlfs as backend, as below:
ENDPOINT = f"https://{ACCOUNT_NAME}.dfs.core.windows.net"
storage_options={'connection_string': f"{ENDPOINT}/{CONTAINER_NAME}/?{sas_token}"}
ddf = dd.read_parquet(
f"abfs://{CONTAINER_NAME}/**/*.parquet",
storage_options=storage_options
)
I get the following error:
ValueError: unable to connect to account for Connection string missing required connection details.
Any thoughts? Thanks in advance :)