2

I am currently implementing an adapter for PostgreSQL using libpq, and I was wondering if the single-row mode as set through PQsetSingleRowMode (https://www.postgresql.org/docs/9.6/static/libpq-single-row-mode.html) was having known performance issues?

I was under the impression that in this mode, libpq would use a more intelligent/efficient buffering than what can be achieved with a cursor and explicit FETCH.

However, when running 10x a simple query ("select id from mytable", returning 5000 records). With PostgreSQL 9.6.3 (on both client and server side), I am observing the following performance patterns:

  • PQexecParams (all at once): 78 msec
  • PQsetSingleRowMode: 3047 msec (most of it spent in PQgetResult)
  • PQexecParams + FETCH 1 : 3313 msec (most of it spent in PQgetResult)
  • PQexecParams + FETCH 10 : 219 msec

So the single-row mode appears to be barely more efficient than creating a cursor and fetching row-by-row... Is that correct behaviour? or is there some other option to control single-row mode?

(in my usage cases, fetching all at once would be risky in terms of memory usage for some queries, so the choice is really between single-row mode or fetching a few records at once)

Extra question: is there a way to abort fetching a single-row mode mid-way easily? When using a cursor+fetch this is trivial, but for single-row mode, it appears you either need to get all results or use PQcancel)

Addendum: Had another run with a profiler, turns out the calls PQgetResult makes to malloc and free are the bottlenecks (about 60% and 30% of CPU time respectively), both functions come from MSVCR120 AFAICT (this is under Win 10, server is localhost). I am using the libpq.dll from the "official" PostgreSQL zip. Interestingly enough, running other queries before the benchmark can "sometimes" make the issue disappear. Looks like PQgetResult is hitting a weak spot of malloc/free.

Eric Grange
  • 5,931
  • 1
  • 39
  • 61

2 Answers2

1

That is surprising.

While you can certainly expect a certain overhead for repeated calls to PQgetResult, it shouldn't be as much as you observe.

I ran a test with a query that returns 5 million rows with a database on localhost, and single row mode required slightly more than twice the CPU user time (total execution time was dominated by database server processing time).

3 seconds for 5000 rows sounds fishy, maybe there's something else going on.

Try profiling the executable to see where the time is spent.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Had another run with a profiler, turns out the calls PQgetResult makes to malloc and free are the bottlenecks (about 60% and 30% of CPU time respectively), both functions come from MSVCR120 (this is under Win 10, server is localhost). I am using the libpq.dll from the "official" PostgreSQL zip. Interestingly enough, running other queries before the benchmark can sometimes make the issue disappear. Looks like PQgetResult is hitting on a weak spot of malloc/free. – Eric Grange Jun 23 '17 at 15:24
  • For what it's worth, I just had a situation with _very_ slow operation on PQgetResult with PGsetSingleRowMode and a query in the style of `select ..., (select max(Something) from Table2 where Table2.Item1ID=Table1.ID),... from Table1` I fixed it by moving the subquery with max into the form section. – Stijn Sanders May 16 '19 at 20:14
  • @StijnSanders That's totally unrelated. That is simply an inefficient query, and it takes a long time to compute the result. – Laurenz Albe May 17 '19 at 02:31
  • Yes, but is it acceptable that PQsendQuery returns fast, and this computation happens when calling PQgetResult? If you're expecting to use PQsetSingleRowMode to stream in your results, and see them trickle in slowly, it typically will result in a bad user experience. – Stijn Sanders May 18 '19 at 14:16
  • The user experience wouldn't be better if sending the query blocked until a result is available. Long running queries make users suffer, no matter how you handle them. – Laurenz Albe May 20 '19 at 05:54
0

You were wondering if others also get poor performance with single row mode: Yes, me too, with postgresql 10 under Ubuntu Linux.

I submitted a query returning ~3 million texts. (Once as the result of a function, once just "SELECT primary_key FROM table".) Switching PQsetSingleRowMode() on is just as fast (in fact, 1% slower) than normal PGexec(). Also, the RAM usage of the postgresql-server show no significant difference (even though I do PQclear() on every obtained row).

When the documentation says that PGexec() "can be unworkable for commands that return a large number of rows", maybe they just mean that for your application it might be impractical to get so many results at a time.

See https://www.postgresql.org/docs/current/libpq-single-row-mode.html and https://www.postgresql.org/docs/current/libpq-async.html . For cancellation, https://www.postgresql.org/docs/current/libpq-cancel.html .

If others want to benchmark, here is my C++ code:

void outputFirstColumn(PGconn* conn,
    const string& sql_query,
    bool request_single_row_mode
)
{
  if(! PQsendQuery(conn, sql_query.c_str()))
    throw runtime_error("(E) command could not be dispatched successfully: "
      + string(PQerrorMessage(conn)));

  const int m = request_single_row_mode ? PQsetSingleRowMode(conn) : 0;
  cerr << "single-row-mode activated? " << (m == 1 ? "yes" : "no")<< endl;

  PGresult* res;
  while((res = PQgetResult(conn)))
  {
    const int rows = PQntuples(res);
    for(int r = 0; r < rows; ++r)
      cout << PQgetvalue(res, r, 0) << "\n";

    PQclear(res);
  }
}
user1239014
  • 111
  • 3