3

Having a text file like this:

line one
line two
line three

And running the following code:

with open('file', 'r+') as f:
    print(f.tell())
    print(f.readline().strip())
    print(f.tell())
    # f.seek(f.tell())
    f.write('Hello')
    print(f.tell())

Results in the word "Hello" being written at the very end of the file:

line one
line two
line threeHello

I thought the writing part would start from the last read character position (right after line one), but it doesn't, unless I uncomment f.seek(f.tell()). There may be some fundamentals I'm missing, but I can't find anything in the Python documentation that explains in-depth how this works. What's happening here, what's making it write the word there? And why doesn't this happen if I don't read first, but start writing instead?

The printed values for f.tell() are the following:

0
9
39
M I P A
  • 63
  • 1
  • 6

2 Answers2

2

This looks to be a bug in how io.TextIOWrapper (the class returned by open in text mode) interacts with io.BufferedRandom (the class it wraps in the + modes).

If you change your test case to operate in binary mode:

with open('file', 'rb+') as f:
    print(f.tell())
    print(f.readline().strip())
    print(f.tell())
    # f.seek(f.tell())
    f.write(b'Hello')
    print(f.tell())

the behavior is identical regardless of whether or not the superfluous f.seek(f.tell()) is included.

The problem appears to be caused by the multiple layers of buffering involved. What you get back is a io.TextIOWrapper wrapping an io.BufferedRandom (which in turn wraps an io.FileIO). The TextIOWrapper reads chunks from the io.BufferedRandom to amortize the cost of decoding from bytes to text, so when you call readline, it's actually consuming and decoding your whole file (it's so small it fits in one chunk), leaving BufferedRandom positioned at the end of the file (even though logically it should only be midway through, and TextIOWrapper.tell reports a position corresponding to that logical position).

When you turn around and write, the TextIOWrapper encodes the data and passes it along to BufferedRandom, which still believes itself to be at the end of the file; since TextIOWrapper doesn't correct this, the data gets tacked on to the end. The seeming no-op f.seek(f.tell()) resynchronizes the TextIOWrapper with the underling BufferedRandom to get the expected behavior. It shouldn't really be necessary (I recommend filing a bug to ensure writes go to the logical tell position, as I can't find an existing bug, though Python 3 f.tell() gets out of sync with file pointer in binary append+read mode is superficially similar), but at least the workaround is relatively simple.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
1

The problem has to do with buffered IO.

the open() function seems to open a buffered file handle.

So in fact whenever something from the file is read, at least an entire buffer is read in which seems to be on my machine about 8k (8192) bytes. This is for optimizing performance.

So readline will read one block return the first line and keep the rest in a buffer for potential future reads.

f.tell() gives you the position relative to the bytes, that were already returned by readline().

This you can force the write pointer with f.seek(f.tell()) to the place, that you intended. Without the explicit seek statement you will write after the buffer.

Use following script to illustrate and look at the output:

You will see, that I tried to play with the buffering parameter. Accordin to the doc 1 means line buffering, but I don't see any change in behavior.

with open("file", "w") as f:
    f.write(("*" * 79 +"\n") * 1000)

with open('file', 'r+', buffering=1) as f:
    print(f.tell())
    print(f.readline().strip())
    print(f.tell())
    # f.seek(f.tell())
    f.write('Hello')
    print(f.tell())

print("----------- file contents")
with open("file", "r") as f:
    pass
    print(f.read())
print("----------- END")

So if you write after a readline(), then it will write the new data after the buffer, that's read in.

f.tell() on the other hand returns you the position, that tells you how many bytes were already returned.

The output will be:

0
*******************************************************************************
80
8197
8202
----------- file contents
*******************************************************************************
*******************************************************************************
...
*******************************************************************************
********************************HelloHello*************************************
*******************************************************************************
*******************************************************************************
*******************************************************************************
*******************************************************************************
...
gelonida
  • 5,327
  • 2
  • 23
  • 41