1

I was wondering which is faster - joining several selects into one join select or actually doing the separate selects is faster ?

I'm doing a task which has many tables(currently 4 but it'll increase probably) related with each other and I select them via LEFT JOIN. Will doing such sql queries to postgresql get it over the edge?

What do you think from your experience :)

tftd
  • 1,498
  • 7
  • 25
  • 40
  • 2
    4 tables isn't "many tables", it's almost nothing. Most of our SELECT statements have 10 to 20 joins in them, no problem at all for PostgreSQL. Use EXPLAIN, that's what you need. – Frank Heikens Jan 27 '11 at 07:58
  • Thanks for sharing. Is your application having lots of users ? The site I'm writing this sql for is going to have somewhere between 500-1000 concurrent users on it. Do you think that will be a problem for a 8xCPU, 2gb ram server ? – tftd Jan 27 '11 at 16:07
  • 1
    When you have 500-1000 concurrent connections, you need a connectionpool. We have up to 1500 concurrent users working in the application and doing their thing, on a pool of just 100 concurrent connections. Works great, also for queries with many joins. 2GB RAM is a problem, that's not going to give you much performance. For a 1000 concurrent connections, you need as much RAM as you can buy. RAM is cheap. – Frank Heikens Jan 28 '11 at 08:55

2 Answers2

2

The answer is "It depends on your database", though generally writing the query as a JOIN is preferred as it's easier to read and gives the query planner some hints as to what you're doing/

You should ask Postgres to EXPLAIN (or EXPLAIN ANALYZE) the query plan for your query and see if there are any substantial differences - at the same time you might also spot some other things you can optimize, like indexing. See the Postgres manual for more info on using EXPLAIN.

voretaq7
  • 79,879
  • 17
  • 130
  • 214
2

Generally speaking, there is a cost to prepare each query and a cost to retrieve the data. For small datasets, executing SELECTs in a loop isn't that bad, but if you're doing a SELECT on a table with 1000 rows, then for each of those rows doing another SELECT on a different table with 1000 rows, the difference will be very noticeable, even if the looped selects are executed from a query prepared in advance.

Even if the cost of preparing each query in the loop is zero, it is possible that the JOINed elements would reduce the total size of the data retrieved. For instance, if you are joining your 1000-row table against a table with only one matching row, the JOIN version of the query would return one row, while the separate SELECTs would return 1000 rows from the first table, with the loop producing 999 empty sets and 1 row.

If you are requesting one specific item from each table rather than looping over a set of rows, then the difference between one "big" query and 4 little queries is probably minuscule. As voretaq7 said, getting postgresql to EXPLAIN what each query will do and how long it will take to do it would go a long way towards figuring out exactly what will happen.

DerfK
  • 19,493
  • 2
  • 38
  • 54