1

Consider:

with open('test.txt', 'w') as f:
    for i in range(5):
        f.write("Line {}\n".format(i))

with open('test.txt', 'r') as f:
    f.readline()
    for line in f.readlines():
        print(line.strip())

This outputs

Line 1
Line 2
Line 3
Line 4

That is, f has an internal iterator and f.readline() consumes the first line and f.readlines() reads all other lines till the end of file. Is this expected/guaranteed from a language point of view?

The only information I found is from docs.python.org,

If you want to read all the lines of a file in a list you can also use list(f) or f.readlines().

which I feel is ambiguous.

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
pingul
  • 3,351
  • 3
  • 25
  • 43

3 Answers3

2

When they mention that trick in the documentation, they don't expect that you fiddle with the iterator first.

Yes, this is expected (and useful, when you want to skip a title line for instance, then read the rest of the lines).

If you want to be sure to read all the lines just rewind the file prior to calling readlines:

f.seek(0)
lines = f.readlines()

The documentation is a bit scarce about readlines not rewinding the file. I did quite a lot of googling, it just seems implied & natural. If you're not still convinced, you have to take a look at the source code (bytesio.c from Python 3.6.1 source):

static PyObject *
_io_BytesIO_readlines_impl(bytesio *self, PyObject *arg)
/*[clinic end generated code: output=09b8e34c880808ff input=691aa1314f2c2a87]*/
{
    Py_ssize_t maxsize, size, n;
    PyObject *result, *line;
    char *output;

    CHECK_CLOSED(self);

    if (PyLong_Check(arg)) {
        maxsize = PyLong_AsSsize_t(arg);
        if (maxsize == -1 && PyErr_Occurred())
            return NULL;
    }
    else if (arg == Py_None) {
        /* No size limit, by default. */
        maxsize = -1;
    }
    else {
        PyErr_Format(PyExc_TypeError, "integer argument expected, got '%s'",
                     Py_TYPE(arg)->tp_name);
        return NULL;
    }

    size = 0;
    result = PyList_New(0);
    if (!result)
        return NULL;

    output = PyBytes_AS_STRING(self->buf) + self->pos;
    while ((n = scan_eol(self, -1)) != 0) {
        self->pos += n;

I stopped pasting just after the readline loop started. On the line above, we see that the code is using the current self->pos value of the object. And it is not reset in the beginning of the code.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
1

Two reasons to believe that readlines() reading from the current position instead of from the beginning of the file is 'guaranteed':

  1. Per the docs, open returns a file object which per the glossary means something that implements the contract defined in the io module. The io module docs tell us that .readlines() will

    Read and return a list of lines from the stream.

    Note also that the term "stream position" is used frequently throughout the io docs. I suppose I have to admit that the docs don't 100% unambiguously and explicitly say that readlines() will start reading from the current stream position rather than from the beginning of the file (or the middle, or a random position, or a position that varies depending upon the day of the week). However, I think it's fair to say that - given that it's established in the io docs that streams have positions - any interpretation other than reading from the current stream position would be perverse, even if we didn't have any real-life implementations to look at.

  2. It's what CPython does, and CPython is widely understood to be Python's official reference interpreter (as noted in the docs at, for example, https://docs.python.org/devguide/#other-interpreter-implementations).

Maybe that argument isn't quite as formal or rigorous as an equivalent argument could be that looked at the specs of, say, C, C++, or ECMAScript. If that troubles you, then too bad, because you're not going to find that level of formality in the Python world. Python's docs are its specification, but they're also documentation meant for ordinary developers working in the language, and as a consequence don't define behaviour quite as anally as the formal standards of other languages tend to. When in doubt, interpret the docs in the most natural way and presume that Python will follow the principle of least astonishment, and if that doesn't provide enough certainty, trust the CPython reference interpreter.

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
-2

That's what the readlines() is supposed to do.

alex
  • 152
  • 11