0

I need to read a larger csv efficiently from container using Python Azure Function.

I am using the below code for reading csv, it works fine for small csv but there must be some other way to read larger csv efficiently.

# Container Connection.
container_client1 = ContainerClient.from_connection_string(
    conn_str=conn_str, 
    container_name=container_name
    )

# Reading File.
downloaded_blob = container_client.download_blob(file_name.csv)
df = pd.read_csv(StringIO(downloaded_blob.content_as_text()))

The Above function taking too much time to read ~2gb file. I need help to efficiently read the larger csv using Python Azure Function.

Kalyan Rao
  • 15
  • 4

1 Answers1

0

One of the workaround is to process the file in chunks, resulting in lower memory use while parsing.

chunksize = 10 ** 6
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

NOTE:- chunksize parameter refers to number of rows per chunk.

Here is a thread that you can refer to.

REFERENCES: pandas.read_csv

SwethaKandikonda
  • 7,513
  • 2
  • 4
  • 18