0

I am trying to get a csv file from Azure Data Lake Gen2, and then perform some operations on each row. However, the requirement is, not to download the file to a physical location. And hence, I am using file_client.download_File().readAll() to get the file in a Byte Stream.

However, I am unable to split the file rows/columns and get them into a list.

 x = file_client.download_file()
 bystream = x.readall()

WHAT TO DO WITH THIS bystream ?

I am however able to do this with downloaded file using WITH OPEN () AS CSV and then using this CSV stream in csv.reader()

Can someone please help with handling this bytestream?

sogeking
  • 1,216
  • 2
  • 14
  • 45

1 Answers1

0

A late update that I was able to resolve this issue by converting the downloaded stream to Text I/O. (didnt need to convert it to List, as Pandas Dataframe was better option)

Here is the code snippet that worked :

 stream = io.StringIO(file_client.download_file().readall().decode("utf-8"))
 dataframe1 = pd.read_csv(stream, sep= "|")

Here, file_client is connection to an Azure Data Lake where the csv file is stored. The code downloads the file as a stream in-memory, and loads it to a dataframe. (No need to write it to a local file location)