Count number of chunks

Question

I'm reading in a large csv file using chuncksize (pandas DataFrame), like so

reader = pd.read_csv('log_file.csv', low_memory = False, chunksize = 4e7)

I know I could just calculate the number of chunks with which it reads in the file but I would like to do it automatically and save the number of chunks into a variable, like so (in pseudo code)

number_of_chuncks = countChuncks(reader)

Any ideas?

Yes, I need to change formats, multiindexing, do mean and std, etc. but that all works fine. I would just like to know number of chuncks — theresemoreau, Oct 25 '17 at 16:02
Just track as you go maybe? `for number_of_chunks, df in enumerate(reader, start=1):` ... ? — Jon Clements, Oct 25 '17 at 16:03

Amador · Answer 1 · 2021-09-21T18:12:06.943

You can use a generator expression to iterate through reader (a TextFileReader returned by read_csv when we define chunksize) and sum 1 for each iteration:

number_of_chunks = sum(1 for chunk in reader)

Alternatively, you can use a generator expression to count the number of lines in your file (similar logic of the first option, but iterating through the lines of the file), than divide this number by the chunksize and round up the result (with math.ceil)

import math
number_of_rows = sum(1 for row in open('log_file.csv', 'r'))
number_of_chunks = math.ceil(number_of_rows/chunksize)

or

import math
number_of_chunks = math.ceil(sum(1 for row in open('log_file.csv', 'r'))/chunksize)

In my tests, the second solution showed a better performance than the first.

Count number of chunks

1 Answers1