2

Though I would imagine that append mode is "smart" enough to only insert the new bytes being appended, I want to make absolutely sure that Python doesn't handle it by re-writing the entire file along with the new bytes.

I am attempting to keep a running backup of a program log, and it could reach several thousand records in a CSV format.

Connor Spangler
  • 805
  • 2
  • 12
  • 29
  • This is probably up to the underlying system call. – juanpa.arrivillaga Nov 30 '20 at 03:10
  • Then assuming we're talking about a modern windows install? – Connor Spangler Nov 30 '20 at 03:11
  • What is actually going on I couldn't say, that is getting into details (kernel and hardware stuff) that are above my pay grade. I am practically certain that Python itself isn't merely copying the file to memory, truncating it, then re-writing it. I generally have faith that the kernel does things efficiently. But I'm interested in the answer to this too. – juanpa.arrivillaga Nov 30 '20 at 03:29
  • 1
    A related question: https://stackoverflow.com/questions/33495283/what-does-opening-a-file-actually-do – juanpa.arrivillaga Nov 30 '20 at 03:29

1 Answers1

2

Python file operations are convenience wrappers over operating system file operations. The operating system either implements this file system operations internally, forwards them to a loadable module (plugin) or an external server (NFS,SMB). Most of the operating systems since very 1971 are capable to perform appending data to the existing file. At least all the ones that claim to be even remotely POSIX compliant.

The POSIX append mode simply opens the file for writing and moves the file pointer to the end of the file. This means that all the write operations will just write past the end of the file.

There might be a few exceptions to that, for example some routine might use low level system calls to move the file pointer backwards. Or the underlying file system might be not POSIX compliant and use some form of object transactional storage like AWS S3. But for any standard scenario I wouldn't worry about such cases.

However since you mentioned backup as your use case you need to be extra careful. Backups are not as easy as they seem on the surface. Things to worry about, various caches that might hold data in memory before if it is written to disk. What will happen if the power goes out just right after you appended new records. Also, what will happen if somebody starts several copies of your program?

And the last thing. Unless you are running on a 1980s 8bit computer a few thousand CSV lines is nothing to the modern hardware. Even if the files are loaded and written back you wouldn't notice any difference

Vlad
  • 9,180
  • 5
  • 48
  • 67
  • Great answer! In hindsight I really should have realized this was a POSIX behavior question first and foremost. As a follow-up, do you have any good resource to point me to in regards to backup paradigms? Thank you! – Connor Spangler Nov 30 '20 at 03:43