1

Searched high and low for this one, and haven't managed to find a definitive answer. I'm just getting started architecting an application and we've decided to use Postgresql for our database, and C for the server. Some initial tests are showing some confusing number, however. It seems as though accessing the database via text strings is the fastest method, which seems counter intuitive to me. Surely the database is storing ints and floats internally as binary and having to convert them to present them as text, which I then have to convert back to binary in order to use. Seems the faster route would be to just leave everything as binary, but to date I can't seem to get binary access to show this benefit.

Here's a simple test case that I built to show what I am seeing (schema + code)

http://pastebin.com/1AhB8avx

And I'm seeing timings like this:

Starting iteration 0... Iteration 0: fetched 50000 binary records via PQparamExec in 35 seconds Iteration 0: fetched 50000 binary records via PQexecf in 34 seconds Iteration 0: fetched 50000 text records via PQexec in 25 seconds

Additional iterations just repeat that pattern more or less. I also tried using libpq directly with similar results, but didn't port them into this test case.

Is this in line with your experience, or am I just being stupid somewhere?

Thanks!

Teeleton
  • 11
  • 1
  • 1
    I took a quick look at your code, and it appears you are doing "row-at-a-time" processing; effectively you are benchmarking your network. Every record you fetch does one round trip to the database (2* 1..10 ms + latency) The processing itself is nothing compared to this I/O overhead. (2 system calls + 2* network I/O *per row*) For comparison: a single query could easily spit out a result set of 100K records per second. In one round trip. – wildplasser Dec 11 '12 at 00:43
  • That's really just a function of the simple test case. The actual application is joining several tables together and returning tens of rows, but the results are the same. Also, the database is local to the server and communicating over a local UNIX socket (not a TCP socket). Also, for all intents and purposes, any overhead should be identical for all three cases, since they're all following the same path in the database. The only thing that's changing is the API method for returning the data. – Teeleton Dec 11 '12 at 01:59

1 Answers1

2

There doesn't seem to be a reason for this, and in fact when running your code on my ubuntu desktop, I don't get such a difference:

Starting iteration 0...
Iteration    0: fetched 50000 binary records via PQparamExec in 13 seconds
Iteration    0: fetched 50000 binary records via PQexecf in 13 seconds
Iteration    0: fetched 50000 text records via PQexec in 12 seconds

Starting iteration 1...
Iteration    1: fetched 50000 binary records via PQparamExec in 13 seconds
Iteration    1: fetched 50000 binary records via PQexecf in 14 seconds
Iteration    1: fetched 50000 text records via PQexec in 12 seconds

Starting iteration 2...
Iteration    2: fetched 50000 binary records via PQparamExec in 12 seconds
Iteration    2: fetched 50000 binary records via PQexecf in 13 seconds
Iteration    2: fetched 50000 text records via PQexec in 11 seconds

Starting iteration 3...
Iteration    3: fetched 50000 binary records via PQparamExec in 12 seconds
Iteration    3: fetched 50000 binary records via PQexecf in 12 seconds
Iteration    3: fetched 50000 text records via PQexec in 12 seconds

Starting iteration 4...
Iteration    4: fetched 50000 binary records via PQparamExec in 13 seconds
Iteration    4: fetched 50000 binary records via PQexecf in 12 seconds
Iteration    4: fetched 50000 text records via PQexec in 12 seconds

Note that since the code calls time() twice to measure the time elapsed, its accuracy is 2s, which is a bit high in this context. You should consider gettimeofday() instead, or even better clock_gettime() on CLOCK_PROCESS_CPUTIME_ID.

Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156
  • Interesting... On my system, the different is pronounced enough (and variable enough), that trying to measure to the millisecond seemed overkill. Even in your output, the text queries are consistently the fastest of the three, even if the margin is significantly less. Perhaps I need to look at my build configuration for postgresql. – Teeleton Dec 12 '12 at 04:14