1

Using this code to read a multi-part parquet files with the prefix '/data/key' from a private s3 bucket, not from AWS

import dask as dd
dd.read_parquet(
    's3://ns1/data/key',
    storage_options={
        'key': 'key',
        'secret': 'secret',
        'client_kwargs': {'endpoint_url': 'https://s3.sample-private-cloud.com'}
    }
)

Why am I getting an error:

TypeError: 'coroutine' object is not iterable

I was able to download the file using boto3 client but unable to read it using dask. Dask documentation doesn't mention asynchronous process anywhere (await, async), so not sure why I am getting this error.

nolio
  • 81
  • 6

1 Answers1

1

Using this code to read a multi-part parquet files with the prefix '/data/key'

If you are trying to load all files with a prefix 'data/key', you should add a * at the end of the pattern, like this 'data/key*':

import dask as dd
dd.read_parquet(
    's3://ns1/data/key*',
    storage_options={
        'key': 'key',
        'secret': 'secret',
        'client_kwargs': {'endpoint_url': 'https://s3.sample-private-cloud.com'}
    }
)
locorecto
  • 1,178
  • 3
  • 13
  • 40