1

I have a database with:

[business] must have a [location]
[location] may have many [business]
[location] may have many [postcode]
[postcode] may be for many [location]
[postcode] must have a [state]

I want a query to return the top 10 locations for each state in terms of the business count.

To make more sense, it may come out:

Western Australia
  - perth
  - bunbury
  - etc. up to 10

(i.e. perth has the most businesses listed in Western Australia)

Victoria
  - melbourne
  - st kilda
  - etc. up to 10

Etc.

I was hoping to achieve this without using UNION. I haven't done complex SQL for a while and it's hurting my head.

I am using MySQL.

A similar way to put this would be if you had categories, products and orders and wanted the the top ten products in terms of order count for each category.

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
oak
  • 119
  • 2
  • 7
  • 1
    You're going to need to include your table definitions if you want people to help you craft a MySQL query – Dancrumb Jan 13 '11 at 03:10
  • If you use the categories, products and orders example at the end you could imagine a standard table definition. It's just as much a conceptual question about how to achieve this. I'm happy to write the query I just have no idea how to structure it without using union. – oak Jan 13 '11 at 03:15
  • Tagged with `greatest-n-per-group` to guide you to similar questions -- I saw someone else do this (and took the exact comment! :-). Anyway, this would done with SQL Server with `ROW_NUMBER(...PARTITION ON...)` so it's a matter of finding a (nice) MySQL equivalent, if it exits. –  Jan 13 '11 at 05:01
  • This question perked my interest, so I started looking at it ... best explanation/example I could find was http://explainextended.com/2009/03/05/row-sampling/. It uses some messy 'side-effecting' variables but the article states it runs in `Omega(n)` so it should be fine. This is similar as cyberkiwi proposed below. –  Jan 13 '11 at 05:06

1 Answers1

1

This query should work.

select * from (
    select a.state, a.location_id, C,
        @n:=case when @s=a.state then @n+1 else 1 end counter,
        @s:=a.state
    from (select @s:=null) b, (
        select pc.state, b.location_id, COUNT(b.location_id) C
        from postcode pc
        inner join location_postcode lp on lp.postcode_id=pc.id
        inner join business b on b.location_id=lp.location_id
        group by pc.state, b.location_id
        order by pc.state, C desc
    ) a
) c
where counter <= 10

I used field names that should be easy to follow, assuming these tables exist with stated relationships:

business M-1 location
location M-M postcode
    (expands to) => location 1-M location_postcode M-1 postcode
postcode M-1 state

The query has beentested with this data:

create table business (location_id int);
insert into business select floor(rand()*10);
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
create table location_postcode (location_id int, postcode_id int);
insert into location_postcode select 1,1;
insert into location_postcode select 2,1;
insert into location_postcode select 3,1;
insert into location_postcode select 4,2;
insert into location_postcode select 5,1;
insert into location_postcode select 5,2;
insert into location_postcode select 6,1;
insert into location_postcode select 6,3;
insert into location_postcode select 7,1;
insert into location_postcode select 7,4;
insert into location_postcode select 8,5;
insert into location_postcode select 9,6;
insert into location_postcode select 10,7;
create table postcode (id int, state int);
insert into postcode select 1,1;
insert into postcode select 2,2;
insert into postcode select 3,3;
insert into postcode select 4,4;
insert into postcode select 5,4;
insert into postcode select 6,5;
insert into postcode select 7,5;

Which doesn't quite create enough records for a "top 10" from each, but you will see how the COUNTER column is ranking correctly. To see it working against this small data set, first leave this filter in there

where counter <= 10

to check the COUNTER column, then reduce it to something like 2 or 3, to show only the top 2 or 3 per state.

RichardTheKiwi
  • 105,798
  • 26
  • 196
  • 262