1

I am new to Azure cloud and have some .parquet datafiles stored in the datalake, I want to read them in a dataframe (pandas or dask) using python. Is there a way to read the parquet files in python other than using spark? I do not want to download the data on my local machine but read them directly.

Any suggestions?

Rebe
  • 37
  • 2
  • 12
  • Does this answer your question? [How can i read a file from Azure Data Lake Gen 2 using python](https://stackoverflow.com/questions/61579841/how-can-i-read-a-file-from-azure-data-lake-gen-2-using-python) – Chris Dec 07 '21 at 14:24
  • I am looking for a solution that does not use Spark, or using spark is the only way? – Rebe Dec 08 '21 at 10:05
  • right click the file in azure storage explorer, get the SAS url, and use pandas `read.csv` on the url – Chris Dec 08 '21 at 13:18

2 Answers2

0

You can read parquet files directly using read_parquet(). Here is a sample that worked for me.

import pandas as pd
source ='<Your Blob SAS URL>'
df = pd.read_parquet(source)
print(df)

Output :

enter image description here

REFERENCES : Read file from Azure Blob storage to directly to data frame using Python

SwethaKandikonda
  • 7,513
  • 2
  • 4
  • 18
  • Thanks. I figured out a way using pd.read_parquet(path,filesytem) to read any file in the blob. – Rebe Dec 13 '21 at 09:43
0

I have found an efficient way to read parquet files into pandas dataframe in python, the code is as follows for anyone looking for an answer;

import azure.identity
import pandas as pd
import pyarrow.fs
import pyarrowfs_adlgen2

handler=pyarrowfs_adlgen2.AccountHandler.from_account_name('YOUR_ACCOUNT_NAME',azure.identity.DefaultAzureCredential())

fs = pyarrow.fs.PyFileSystem(handler)

df = pd.read_parquet('container/dataset.parq', filesystem=fs)
Rebe
  • 37
  • 2
  • 12
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – user11717481 Feb 21 '22 at 14:28