0

I tried to open a parquet on an Azure data lake gen 2 storage using SAS URL generated (with the datetime limit and token embedded in the url) using vaex by doing:

vaex.open(sas_url)

and I got the error

ERROR:MainThread:vaex:error opening 'the path which was also the sas_url(can't post it for security reasons)' ValueError: Do not know how to open (can't publicize the sas url) , no handler for https is known

How do I get vaex to read the file or is there another azure storage that works better with vaex?

Temiloluwa
  • 23
  • 6
  • Hi @Temiloluwa, it is showing the same error for me, even when trying with blob URL. Also, there is not supported document available where vaex is integrated with Azure Storage. In official doc they gave example with AWS S3 and GCP storage. please visit https://vaex.io/docs/example_io.html. Will update you if I get anything useful. – Utkarsh Pal Aug 19 '21 at 12:30

2 Answers2

2

I finally found a solution! Vaex can read files in Azure blob storage with this:

import vaex
import adlfs

storage_account = "..."
account_key = "..."
container = "..."
object_path = "..."

fs = adlfs.AzureBlobFileSystem(account_name=storage_account, account_key=account_key)
df = vaex.open(f"abfs://{container}/{object_path}", fs=fs)

for more details, I found the solution in https://github.com/vaexio/vaex/issues/1272

Temiloluwa
  • 23
  • 6
-1

Vaex is not capable to read the data using https source, that's the reason you are getting error "no handler for https is known".

Also, as per the document, vaex supports data input from Amazon S3 buckets and Google cloud storage.

Cloud support:

Amazon Web Services S3

Google Cloud Storage

Other cloud storage options

They mentioned that other cloud storages are also supported but there is no supporting document anywhere with any example where they are fetching the data from Azure storage account, that also using SAS URL.

Also please visit API document for vaex library for more info.

Utkarsh Pal
  • 4,079
  • 1
  • 5
  • 14