S3Fs is a Pythonic file interface to S3, does DASK have any Pythonic interface to Azure Storage Blob. Python SDK's for Azure Storage Blob provide ways to read and write to blob, but the interface requires the file to be downloaded to the local machine from cloud. I am looking for solutions that which read the blob to support DASK parallel read as either stream or string without persisting to local disk.
Asked
Active
Viewed 1,839 times
3
-
1Might azure-datalake storage be a solution for you? – mdurant Dec 10 '17 at 21:50
-
Its an option, could you point me to solutions implemented using azure-data lake. – Charles Selvaraj Dec 14 '17 at 04:53
1 Answers
1
I have newly pushed code here: https://github.com/dask/dask-adlfs
You may pip-install from that location, although you may be best served by conda-installing the requirements (dask, cffi, oauthlib) beforehand. In a python session, doing import dask_adlfs
will be enough to register the backend with Dask, such that thereafter you can use azure URLs with dask functions like:
import dask.dataframe as dd
df = dd.read_csv('adl://mystore/path/to/*.csv', storage_options={
tenant_id='mytenant', client_id='myclient',
client_secret='mysecret'})
Since this code is totally brand new and untested, expect rough edges. With luck, you can help iron out those edges.

mdurant
- 27,272
- 5
- 45
- 74