2

I'm working in linux. I would like to display the percentage of file parsed. That's why after reading a bit I decided that the most accurate way to do that would be get the total size (bytes) of the file i'm parsing then calculate the size (bytes) of each line after reading it.

This is my dummy simplified code.

if __name__ == '__main__':

read_bytes = 0
total_file_size = os.path.getsize(myfile)

with open(myfile, 'r') as input_file:
    for line in input_file:
        read_bytes += sys.getsizeof(line)

        print "do my stuff"

print total_file_size
print read_bytes

Output is:

193794194

203979278

Obviously there's something count in line that's increasing total size. I've tried with:

read_bytes += sys.getsizeof(line) - sys.getsizeof('\n')

And output is:

193794194

193309190

I must be missing something.

gmarco
  • 535
  • 3
  • 8
  • 16

2 Answers2

1

Use len instead of sys.getsizeof():

sys.getsizeof() return used byte by interpreter to hold that object.

>>> len('asdf')
4
>>> import sys
>>> sys.getsizeof('asdf')
37

In addition to that, if you are running the program in the Window, you should use binary mode.

open(myfile, 'rb')

NOTE

Using file.tell, you don't need to calculate current position.

falsetru
  • 357,413
  • 63
  • 732
  • 636
  • You're totally right. I was so obfuscated trying to use getsizeof. Because each line is a string len(line) works perfect ! Thank you. – gmarco Aug 14 '13 at 14:15
  • @Guillermo, Using `file.tell()`, you don't need to calculate read count. – falsetru Aug 14 '13 at 14:30
0

replace:

read_bytes += sys.getsizeof(line) - sys.getsizeof('\n') 
with read_bytes += sys.getsizeof(line) - sys.getsizeof('\n') - 49 as 49 is ascii of '0'
Ahmet
  • 7,527
  • 3
  • 23
  • 47
DevilWhite
  • 51
  • 6
  • Please remember that: "Brevity is acceptable, but fuller explanations are better." [How-do-I-write-a-good-answer](https://stackoverflow.com/help/how-to-answer) – Ahmet Aug 28 '20 at 07:03
  • Ok :) I was just adding extra information if anybody scroll upto this! – DevilWhite Sep 30 '20 at 16:28