13

With respect to two interactions below, I'd expect the same file output by both, but the second one writes at the end of the file. The only difference is a read statement AFTER the write, I don't understand what's happening. What am I missing?

Expected behavior:

>>> f = open("test.txt","w+")
>>> f.write('0123456789')
10
>>> f.seek(0)
0
>>> f.read(3)
'012'
>>> f.seek(0,1)
3
>>> f.write('XX')
2
>>> f.seek(0)
0
>>> f.read()
'012XX56789'
>>> f.close()

Unexpected behavior:

>>> f = open("test.txt","w+")
>>> f.write('0123456789')
10
>>> f.seek(0)
0
>>> f.read(3)
'012'
>>> f.seek(0,1)
3
>>> f.write('XX')
2
>>> f.read(2)
'34'
>>> f.seek(0)
0
>>> f.read()
'0123456789XX'
>>> f.close()

As you can see XX was written after the whole line, while I was at position 3 when writing these characters.

Chris Maes
  • 35,025
  • 12
  • 111
  • 136
Markus Steiner
  • 243
  • 1
  • 4
  • 1
    Welcome to StackOverflow. Did the file exist before the first interaction? Did it exist before the second interaction? The w+ argument means write and append to the file. – rajah9 Sep 02 '19 at 11:23
  • I am able to reproduce this strange behaviour with python 3.6.5, both with a previously existing file and a non-existing file... seems like a python bug to me. – Chris Maes Sep 02 '19 at 11:25
  • 2
    Looks like a buffering-related bug to me. Adding `f.flush()` after the `write('XX')` makes it work as expected. – Aran-Fey Sep 02 '19 at 11:45
  • 1
    Same problem with your second code. I suspected that there could be something related to text mode and tried it in binary mode: it works as expected in this case. – Thierry Lathuille Sep 02 '19 at 11:45
  • 5
    Answer on [related question](https://stackoverflow.com/a/783843/9609843) has something interesting info: it contains a [link](https://mail.python.org/pipermail/python-bugs-list/2005-August/029886.html) where it is stated that _the effect of mixing reads with writes on a file open for update is entirely undefined unless a file-positioning operation occurs between them (for example, a seek())._ – sanyassh Sep 02 '19 at 11:50
  • 1
    @saniash Inserting `f.seek(0,1)` between the write and the read gives the expected behaviour. – Thierry Lathuille Sep 02 '19 at 11:55

1 Answers1

3

What happened was that the write was buffered, and the intervening read advanced the underlying file position to the end of the file (since it’s small) before the write was committed (flushed). If what follows the write is a seek, the write buffer is committed (to the right place) before actually seeking. This approach avoids overhead on every read to check for pending writes and has long been specified by POSIX.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76
  • You make it sound normal, but in my opinon is not. Buffering is supposed to be transparent, it shouldn't alter the program semantics, only performance. That 2nd sequence would normally play differently in another language, I think this is a Python implementation peculiarity. And I wasn't aware of it. Now that I know I'll be careful, still if you don't know it could be the source of nasty bugs. – Markus Steiner Sep 03 '19 at 02:58
  • @MarkusSteiner: Opening for update with buffering is considered to be a special case, and Python just defers to C for its semantics. You’re welcome to disagree with the design decision, of course! – Davis Herring Sep 03 '19 at 03:11