7

what is different between os.path.getsize(path) and os.stat? which one is best to used in python 3? and when do we use them? and why we have two same solution? I found this answer but I couldn't understand what this quote means:

From this, it seems pretty clear that there is no reason to expect the two approaches to behave differently (except perhaps due to the different structures of the loops in your code)

specifically why we have two approach and what is there different?

Ali Tavallaie
  • 103
  • 1
  • 9
  • You can use `strace` program, if you on linux and compare output. – RedEyed Sep 10 '17 at 20:23
  • Documentation says: `os.path.getsize? Signature: os.path.getsize(filename) Docstring: Return the size of a file, reported by os.stat(). File: /usr/lib/python3.5/genericpath.py Type: function ` So, os.path.getsize(path) is only wrapper above `os.stat()` – RedEyed Sep 10 '17 at 20:25

2 Answers2

7

stat is a POSIX system call (available on Linux, Unix and even Windows) which returns a bunch of information (size, type, protection bits...)

Python has to call it at some point to get the size (and it does), but there's no system call to get only the size.

So they're the same performance-wise (maybe faster with stat but that's only 1 more function call so not I/O related). It's just that os.path.getsize is simpler to write.

that said, to be able to call os.path.getsize you have to make sure that the path is actually a file. When called on a directory, getsize returns some value (tested on Windows) which is probably related to the size of the node, so you have to use os.path.isfile first: another call to os.stat.

In the end, if you want to maximize performance, you have to use os.stat, check infos to see if path is a file, then use the st_size information. That way you're calling stat only once.

If you're using os.walk to scan the directory, you're exposed to more hidden stat calls, so look into os.scandir (Python 3.5).

Related:

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
2

The answer you are linking to shows that the one calls the other:

def getsize(filename):
    """Return the size of a file, reported by os.stat()."""
    return os.stat(filename).st_size

so fundamentally, both functions are using os.stat.

Why? probably because they had similar needs in two different packages, path and stat, and didn't want to duplicate code.

Shawn Mehan
  • 4,513
  • 9
  • 31
  • 51