2

I have a relatively large (as in >10^6 entries) table called "things" which represent locateable objects, e.g. countries, areas, cities, streets, etc. They are used as a tree of objects with a fixed depth, so the table structure looks like this:

id
name
type
continent_id
country_id
city_id
area_id
street_id
etc.

The association inside "things" is 1:n, i.e. a street or area always belongs to a defined city and country (not two or none); the column city_id for example contains the id of the "city" thing for all the objects which are inside that city. The "type" column contains the type of thing (Street, City, etc) as a string.

This table is referenced in another table "actions" as "thing_id". I am trying to generate a table of action location statistics showing the number of active and inactive actions a given location has. A simple JOIN like

SELECT count(nullif(actions.active, 1)) AS icount, 
       count(nullif(actions.active, 0)) AS acount, 
       things.name AS name, things.id AS thing_id, things.city_id AS city_id
  FROM "actions" 
  LEFT JOIN things ON actions.thing_id = things.id 
 WHERE UPPER(substring(things.name, 1, 1)) = UPPER('A') 
   AND actions.datetime_at BETWEEN '2012-09-26 19:52:14' AND '2012-10-26 22:00:00'
 GROUP BY things.name, things.id ORDER BY things.name

will give me a list of "things" (starting with 'A') which have actions associated with them and their active and inactive count like this:

icount | acount | name                      | thing_id | city_id
------------------------------------------------------------------
     0        5   Brooklyn, New York City   | 25       | 23
     1        0   Manhattan, New York City  | 24       | 23
     3        2   New York City             | 23       | 23

Now I would like to

  • only consider "city" things (that's easy: filter by type in "things"), and
  • in the active/inactive counts, use the sum of all actions happening in this city - regardless of whether the action is associated with the city itself or something inside the city (= having the same city_id). With the same dataset as above, the new query should result in
icount | acount | name                      | thing_id | city_id
------------------------------------------------------------------
     4        7   New York City             | 23       | 23

I do not need the thing_id in this table (since it would not be unique anyway), but since I do need the city's name (for display), it is probably just as easy to also output the ID, then I don't have to change as much in my code.

How would I have to modify the above query to achieve this? I'd like to avoid additional trips to the database, and advanced SQL features such as procedures, triggers, views and temporary tables, if possible.

I'm using Postgres 8.3 with Ruby 1.9.3 on Rails 3.0.14 (on Mac OS X 10.7.4).

Thank you! :)

Jens
  • 1,386
  • 14
  • 31
  • 1
    8.3 will be unsupported soon, so it's time to plan your upgrade moderately urgently. There should be no major compatibility roadblocks from 8.3 to 9.2, just watch out for the `bytea_output` change. – Craig Ringer Sep 27 '12 at 00:13
  • Thank you - yes, we are in dire need of upgrades. Current production still runs Ruby 1.8.7 with Rails 3.0 (patched) and PG 8.3 ... working on it! :) – Jens Sep 27 '12 at 10:23

1 Answers1

2

You need to count actions for all things in the city in an independent subquery and then join to a limited set of things:

SELECT c.icount
      ,c.acount      
      ,t.name
      ,t.id AS thing_id
      ,t.city_id
FROM  (
   SELECT t.city_id
         ,count(nullif(a.active, 1)) AS icount
         ,sum(a.active) AS acount
   FROM   things t 
   LEFT   JOIN actions a ON a.thing_id = t.id 
   WHERE  t.city_id = 23           -- to restrict results to one city
   GROUP  BY t.city_id
   ) c                             -- counts per city
JOIN   things t USING (city_id)
WHERE  t.name ILIKE 'A%'
AND    t.datetime_at BETWEEN '2012-09-26 19:52:14'
                         AND '2012-10-26 22:00:00'
ORDER  BY t.name, t.id;

I also simplified a number of other things in your query and used table aliases to make it easier to read.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228