0

I have the simplest table in the world, for looking up lat/lng values for a UK postcode (loaded with full UK postcode data):

CREATE TABLE postcodes (
  postcode char(7) NOT NULL,
  lat double(10,6) NOT NULL,
  lng double(10,6) NOT NULL,
  KEY postcode (postcode)
)

Postcodes in 'postcode' field either have 2 digits at the end of the first half, or one and then a space. I think the space is important for the integrity of how they are matched (??), and besides I don't want to remove the spaces in the table, as I'm also pulling out the postcodes for display purposes (and I don't want a duplicate field, because I'm fussy!). Examples:

'LE115AF', 'BS6 5EE', 'W1A 1AA', 'BS216RS', 'M3 1NH'

So, some have spaces, some don't. Most are 7 chars overall, some only 6.

Anyway, point being is I want users to be able to enter postcode queries, including partial postcodes, with or without spaces, and always find a match if their input string is valid (i.e. they don't enter a full or partial postcode which doesn't exist in the table).

This is how I've done it so far (with some help from PHP):

{...} WHERE `postcode` LIKE '" . str_replace(' ','%',$query) . "%' LIMIT 1

This is good for:

  • full postcodes which don't contain spaces in the db
  • partial postcodes if a space has been entered and there is a corresponding space in the db, or the queried portion stops short of where the space occurs (e.g. 'W1A' will match 'W1A 1AA', 'M3 1' will match 'M3 1AR', etc).

But doesn't work for these queries:

  • 'W1A1AA' should match 'W1A 1AA'
  • 'BS65EE' should match 'BS6 5EE'
  • 'BS65' should match the first 'BS6 5%' postcode in db, which is 'BS6 5AA'
  • 'M31' should likewise match 'M3 1AR'

I'm guessing I need to somehow do some MySQL string function magic to work out if there's a space in the row's postcode field, and adjust my WHERE clause logic accordingly? Anyone got any advice on the best approach? I ideally want to also:

  • avoid MySQL stored procedures (inline functions preferred)
  • do nothing more than inline string functions in the PHP part too
halfer
  • 19,824
  • 17
  • 99
  • 186
Jonny Nott
  • 328
  • 3
  • 14

6 Answers6

5

Create a new column which is just the postcode field with the spaces stripped, and create a unique index on it. You shouldn't find any duplicates. That should reassure you that the space really isn't important :)

Then use that for the lookup, after stripping spaces on your input postcodes.

Bear in mind that solutions that involve applying string functions to the postcode column of the table may stop MySQL from using any indexes on that column. (The index is based on the exact data in the column, so if you start applying functions to that data, the optimiser will generally decide that the index is useless.)

If you do feel the need to reformat things, the easiest option is to work from the knowledge that while the "outbound" part of the postcode -- the part before the space -- varies in format a little, the "inbound" part -- the part after the space -- is always a single digit followed by two letters.

Probably the best resource on the format I've ever found is the Wikipedia entry, by the way.

Matt Gibson
  • 37,886
  • 9
  • 99
  • 128
  • Thanks Matt - I've gone with your stripped-spaces field suggestion in the end. I tried jochen's REGEXP suggestion, but that made the query realllllly sloooow, probably because it has to perform that REGEXP on every one of the > 1 million entries. I suppose that also knocks out the index as you suggest. I've actually arrived at a bit of a hybrid solution - see additional answer. – Jonny Nott Mar 11 '11 at 09:24
1

you can remove spaces on database level as well:

{...} WHERE replace(`postcode`, ' ','') LIKE '" . str_replace(' ','%',$query) . "%' LIMIT 1
konsolenfreddy
  • 9,551
  • 1
  • 25
  • 36
  • Good suggestion, but see Matt's point: "Bear in mind that solutions that involve applying string functions to the postcode column of the table may stop MySQL from using any indexes on that column". Basically, this approach will make the query too slow. – Jonny Nott Mar 12 '11 at 10:17
1

First, I don't think that the space is important. The description on the Royal Mail web page has no mention of spaces. Also for every post code with the space I have seen, the second group was always 3 characters long, so probably you can split it from the back. The web page says "is usually just one number" though, so there may be exceptions.

If you are willing to preprocess the query string (like you did using php in your example), you can solve the problem as follows: you convert your query post code into a regular expression by (1) removing all spaces and then (2) adding ? (i.e. an optional match for a space) between all characters. Finally add a .* at the end to allow for incomplete codes. Examples:

  • W1A1AA becomes W ?1 ?A ?1 ?A ?A.* . This matches "W1A1AA" and "W1A 1AA".
  • M31 becomes M ?3 ?1.*.

Once you have the query post code in this form, you can use MySQL's REGEXP operator for matching:

{...} WHERE `postcode` LIKE 'M ?3 ?1.*' LIMIT 1

Finally, as an aside, your trick of replacing ' ' by % is a bit dangerous. This way BS6 5 would match BS6 456, because the % would match the 4.

jochen
  • 3,728
  • 2
  • 39
  • 49
  • Agree jochen, found that the space isn't important. Good suggestions. Unfortunately, REGEXP matching proves way to expensive, taken the query to > 6 secs on my setup, which isn't workable. Good point though about the wildcard % in LIKE. – Jonny Nott Mar 11 '11 at 09:20
0

I would generate new entries in the table for the sector and area, which would eliminate the need for the LIKE.

A LIKE condition cannot use indexes and therefore needs to scan the entire table for a result. This is slow, especially when you have a full UK postcode database of 1.7 million values.

So, create a new entry for just 'M3' as the postcode. Do the same for 'M31', and so on. As for the lat/long values that correspond to these new entires, you can do some basic maths to calculate the average location of all its individual postcodes.

Try this:

SELECT
  SUBSTRING(postcode, 1, LOCATE(' ', postcode) - 1),
  AVG(lat),
  AVG(long)
FROM
  postcodes
GROUP BY
  SUBSTRING(postcode, 1, LOCATE(' ', postcode) - 1)

You can then just feed the results back into your postcodes table.

Phil Kingston
  • 131
  • 1
  • 4
0

You can query it by splitting all letters.

WHERE `postcode` LIKE '" . implode("%", str_split("W1A1AA")) . "%' LIMIT 1

Because you're querying a field of limited length this shouldn't create too many problems as far as false positives and you can order by likeness in code after retrieval (I assume this is for an autocomplete). Performance should be bad.

Novikov
  • 4,399
  • 3
  • 28
  • 36
0

Matt's solution worked nicely. However, I needed to still allow users to specifically have a space within their query, and handle that, i.e:

  • 'M31' should match 'M31 4AA', whereas
  • 'M3 1' should match 'M3 1AR'

So, my enhanced solution (solves the above problem):

CREATE TABLE postcodes (
  postcode varchar(7) NOT NULL,
  postcode_display char(7) NOT NULL,
  lat double(10,6) NOT NULL,
  lng double(10,6) NOT NULL,
  UNIQUE KEY postcode (postcode),
  UNIQUE KEY postcode_display (postcode_display)
)

postcode has spaces stripped out, postcode_display has them left in..

<?php
if (strlen($query) <= 7 && strpos($query,' ') !== false) { $hasSpace = true; }
?>

..

WHERE `postcode" . ($hasSpace ? '_display' : '') . "` LIKE '" . str_replace(' ',($hasSpace ? '%' : ''),$query) . "%' LIMIT 1

Any room for further refinements?

Jonny Nott
  • 328
  • 3
  • 14
  • Yes, why are you storing all the postcode data twice? It's a total waste of space - as many commenters have observed, you can simply strip out the space for storage in your DB. Then, for display, just re-insert from the back (i.e. insert a space at index postcode.Length() - 3) – Carlos P Feb 26 '12 at 16:00