0

So I am trying to get some geographical data at the moment from the some 34 thousand records we have.

We stored a records of Person, which include the basics like firstname, surname, address and postcode.

I am wanting to get information about the postcode, and link it with their Longitue and Latitude co-ordinates. For this, i also have a table with australian postcodes, and their Long/Lat coords.

SELECT count(p.postcode) AS `count`, p.postcode, pctlt.Lat, pctlt.`Long`
FROM person AS p
INNER JOIN pcodetolonglat AS pctlt ON pctlt.Pcode = p.postcode
WHERE length(p.postcode)=4
GROUP BY p.postcode

currently, this script takes between one too two minutes to run,

Is there a more efficient way to get the same results?

Lars
  • 59
  • 1
  • 2
  • 8
  • 2
    Because I want to count the number of 'Persons' who 'lives' in a postcode, and I want to know the 'Long/Lat' Co-ords for each postcode For example: 68 records in postcode 2148, Lat:-33.78 Long: 150.87 – Lars Feb 21 '14 at 03:34
  • I am thinking getting the results of the postcode count first in Person table like "select count(*) 'count, postcode from person group by postcode" and then join the result to pcodetolonglat table to get lat and long. – anonymous Feb 21 '14 at 03:44
  • Do you have indexes on `Pcode` and `postcode`? – Barmar Feb 21 '14 at 03:45
  • Which table is bigger, `person` or `pcodetolonglat`? If `person` is bigger, it might help to do `WHERE length(pctlt.postcode) = 4`. – Barmar Feb 21 '14 at 03:48
  • postcode has a person_id reference, Pcode does not, Person has about 35k records, and about 20 columns, pcodetolonglat has 16.7 k and 12 columns. The reason i have the length operation on the person postcode is because data stored is not 100% consistent, and i dont want information that isn't a postcode. (also some postcode seem to be foreign or not postcodes at all) – Lars Feb 21 '14 at 03:59

1 Answers1

0

This is your query:

SELECT count(p.postcode) AS `count`, p.postcode, pctlt.Lat, pctlt.`Long`
FROM person p INNER JOIN
     pcodetolonglat pctlt
     ON pctlt.Pcode = p.postcode
WHERE length(p.postcode) = 4
GROUP BY p.postcode;

You cannot do much with indexes on this query. An index on person(postcode) should help. That is the first thing to try. The following is rewrite of your query that eliminates the outer aggregation, replacing it with a subquery:

SELECT pctlt.Pcode, pctlt.Lat, pctlt.`Long`,
       (select count(*) from person p where pctlt.Pcode = p.postcode) as `count`
FROM pcodetolonglat pctlt 
WHERE length(pctlt.Pcode) = 4 and
      exists (select 1 from person p where pctlt.Pcode = p.postcode);

The exists clause is to mimic the effect of the inner join.

With an index on person(postcode) this query may run faster than the original. I don't like replacing group by with correlated subqueries, but in MySQL it sometimes has better performance.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Your Solution here was a lot faster, how ever with 16k different postcode, this gives me approx 15k useless results, as I need too feed the results of this query into another algorithm, I'm only wanting results with a count of at least 1. – Lars Feb 21 '14 at 04:09
  • Thank you for the update. I understand how using an index would be a lot more beneficial for this reason, and after this thread I believe a slight restructure and clean up of the 35k records would be to referent postal code id's instead of postcodes them self would be a more efficient process. – Lars Feb 21 '14 at 04:18