3

Why does the following code use 1.2GB of memory to execute? I was expecting memory usage to be relatively flat regardless of the number passed to generate_series, instead it is steadily increasing. Please - tell me I'm doing something wrong!

if (!PQsendQuery(conn, "select generate_series(1, 10000000)"))
    exit(1);

int i, value;
while (res = PQgetResult(conn)) {
    for (i = 0; i < PQntuples(res); i++) {
        value = atoi(PQgetvalue(res, i, 0));
    }
    PQclear(res);
}

printf("%d\n", value);
PQfinish(conn);

I've put the full source code for this example on pastebin.

Greg Lowe
  • 15,430
  • 2
  • 30
  • 33
  • 1
    Looks like this has been fixed in 9.2: http://www.postgresql.org/docs/9.2/static/libpq-single-row-mode.html – Greg Lowe Mar 30 '13 at 11:42
  • It seems to be 'psql` who [eat the memory](http://www.postgresql.org/message-id/BANLkTimXi6MOspxKBZ4yb+EdN-mRQkm1DA@mail.gmail.com) until he returns all the result at wonce. try to use the `\set FETCH_COUNT 1000` for example. – Houari Mar 30 '13 at 11:49
  • The `for (i = 0; i < PQntuples(res); i++) {}` loop assumes that `PQntuples(res)` is compuatble, so libpq *must have buffered* the entire query result before returning it to the caller. – wildplasser Mar 30 '13 at 12:05
  • I assumed that PQntuples() returned the number of rows currently buffered, and that each call to PQgetResult() would fill the buffer with a chunk of the rows. However, this is disabled by default. A call to PQsetSingleRowMode() fixes this however. I just tried it. I'm now down to a couple of MB, instead of GB ;) – Greg Lowe Mar 30 '13 at 12:18

2 Answers2

5

It appears that by default libpq buffers the entire result, rather than reading it in chunks.

In 9.2 there is a way to change this behaviour, see Single row mode.

I've tried this out, adding a call to PQsetSingleRowMode(conn), directly after PQsendQuery(), drops memory usage down to a few MB. Problem solved!

Greg Lowe
  • 15,430
  • 2
  • 30
  • 33
2

The canonical way of dealing with potentially large resultsets is to declare a CURSOR for the query and execute successive FETCH calls to retrieve them by chunks.

That's also what psql does when the FETCH_COUNT variable is set.

Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156
  • Any idea what the pros and cons are of using a cursor vs single row mode? (Other than it only working in 9.2+, which is a obviously a big con.) – Greg Lowe Mar 31 '13 at 09:19
  • @xxgreg: with a cursor, the query may be planned differently to get the first results faster, depending on [cursor_tuple_fraction](http://www.postgresql.org/docs/current/static/runtime-config-query.html#GUC-CURSOR-TUPLE-FRACTION) – Daniel Vérité Apr 01 '13 at 10:25