3

I'm looking for a way to load data from an Azure DataLake Gen2 using Dask, the content of the container are only parquet files but I only have the account name, account endpoint and an SAS token.

When I use Azure SDK for a File System, I can navigate easily with only those values.

azure_file_system_client = FileSystemClient(
        account_url=endpoint,
        file_system_name="container-name",
        credential=sas_token,
    )

When I try to do the same using abfs in DASK using the adlfs as backend, as below:

ENDPOINT = f"https://{ACCOUNT_NAME}.dfs.core.windows.net"
storage_options={'connection_string': f"{ENDPOINT}/{CONTAINER_NAME}/?{sas_token}"}
ddf = dd.read_parquet(
      f"abfs://{CONTAINER_NAME}/**/*.parquet", 
      storage_options=storage_options
)

I get the following error:

ValueError: unable to connect to account for Connection string missing required connection details.

Any thoughts? Thanks in advance :)

Joe
  • 62,789
  • 6
  • 49
  • 67
user4923
  • 31
  • 1

0 Answers0