7

I'm trying to optimise a PostgreSQL 8.4 query. After greatly simplifying the original query, trying to figure out what's making it choose a bad query plan, I got to the point where running the query under EXPLAIN ANALYZE takes only 0.5s, while running it normally takes 2.8s. It seems obvious then, that what EXPLAIN ANALYZE is showing me is not what it normally does, so whatever it's showing me is useless, isn't it? What is going on here and how do I see what it's really doing?

EMP
  • 59,148
  • 53
  • 164
  • 220
  • 2
    Is the query returning lots of data? My understanding is that `EXPLAIN ANALYZE` discards the data -- perhaps you're gaining back time not having to transfer it through a pipe or network connection? – Thanatos Aug 06 '10 at 02:26
  • About 75,000 rows so I wouldn't say "lots". Certainly shouldn't take much time on a LAN. – EMP Aug 06 '10 at 02:27
  • 1
    Apparently that's enough data that it takes about 1.3s (which would be about 16.25MB or approx 220KB/row) if you're achieving a transfer rate of 100Mbps – Mark Elliot Aug 06 '10 at 02:30
  • 1
    No, the rows are very small. More like 50 bytes per row. – EMP Aug 06 '10 at 03:49

2 Answers2

5

Most likely, the data pages are in the OS disk cache when you are manually running with EXPLAIN ANALYZE in order to try and optimize the query. When run in a normal environment, the pages probably aren't in the cache already and have to be fetched from disk, increasing the runtime.

Matthew Wood
  • 16,017
  • 5
  • 46
  • 35
  • 1
    I don't understand - if they were in the cache when I ran EXPLAIN ANALYZE then why aren't they in there when I run without EXPLAIN immediately after? – EMP Aug 08 '10 at 23:41
  • 9
    Sorry, I misunderstood the order. Now, I would say that it's more likely that the difference is network throughput. I recommend adding a LIMIT clause and trying varying amounts of records (like 1,5,10,100,1000,10000, etc) until you reach your max and compare the times. I'm guessing it will scale roughly as "a+(t*n)" where a is your EXPLAIN ANALYZE time, t is a rough constant of rows per second transferred and n is your number of rows. Obviously, this won't be exact, but I'm guessing it would trend towards it. – Matthew Wood Aug 09 '10 at 14:10
3

It shows less time because:

1) The Total runtime shown by EXPLAIN ANALYZE includes executor start-up and shut-down time, as well as the time to run any triggers that are fired, but it does not include parsing, rewriting, or planning time.

2)Since no output rows are delivered to the client, network transmission costs and I/O conversion costs are not included.

Warning!

The measurement overhead added by EXPLAIN ANALYZE can be significant, especially on machines with slow gettimeofday() operating-system calls. So, it's advisable to use EXPLAIN (ANALYZE TRUE, TIMING FALSE).

dpaks
  • 375
  • 1
  • 13
  • 1
    thanks. I actually downloaded the data to my machine and still it takes about 13 seconds while in the EXPLAIN ANALYZE it's about 0.5 second. how can I optimize data writing? – Dejell Feb 22 '17 at 12:08