How to read parquet files directly from azure datalake without spark?

Question

I am new to Azure cloud and have some .parquet datafiles stored in the datalake, I want to read them in a dataframe (pandas or dask) using python. Is there a way to read the parquet files in python other than using spark? I do not want to download the data on my local machine but read them directly.

Any suggestions?

Does this answer your question? [How can i read a file from Azure Data Lake Gen 2 using python](https://stackoverflow.com/questions/61579841/how-can-i-read-a-file-from-azure-data-lake-gen-2-using-python) — Chris, Dec 07 '21 at 14:24
I am looking for a solution that does not use Spark, or using spark is the only way? — Rebe, Dec 08 '21 at 10:05
right click the file in azure storage explorer, get the SAS url, and use pandas `read.csv` on the url — Chris, Dec 08 '21 at 13:18

score 0 · Answer 1 · answered Dec 09 '21 at 08:17

0

You can read parquet files directly using read_parquet(). Here is a sample that worked for me.

import pandas as pd
source ='<Your Blob SAS URL>'
df = pd.read_parquet(source)
print(df)

Output :

REFERENCES : Read file from Azure Blob storage to directly to data frame using Python

answered Dec 09 '21 at 08:17

SwethaKandikonda

7,513
2
4
18

Thanks. I figured out a way using pd.read_parquet(path,filesytem) to read any file in the blob. – Rebe Dec 13 '21 at 09:43

score 0 · Accepted Answer · answered Feb 21 '22 at 14:22

0

I have found an efficient way to read parquet files into pandas dataframe in python, the code is as follows for anyone looking for an answer;

import azure.identity
import pandas as pd
import pyarrow.fs
import pyarrowfs_adlgen2

handler=pyarrowfs_adlgen2.AccountHandler.from_account_name('YOUR_ACCOUNT_NAME',azure.identity.DefaultAzureCredential())

fs = pyarrow.fs.PyFileSystem(handler)

df = pd.read_parquet('container/dataset.parq', filesystem=fs)

answered Feb 21 '22 at 14:22

Rebe

37
2
12

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – user11717481 Feb 21 '22 at 14:28

How to read parquet files directly from azure datalake without spark?

2 Answers2