Faster searching in Python - Postcodes

Question

I have been working on a no-sql solution to naming a list of N postcodes using a national list of postcodes. So far I have my reference dictionary for the state of NSW in the form :

{'Belowra': 2545, 'Yambulla': 2550, 'Bingie': 2537, ... [n=4700]

My function uses this to look up the names of a postcode:

def look_up_sub(pc, settings):
    output=[]
    for suburb, postcode in postcode_dict.items():
        if postcode == pc and settings=='random':#select match at random
            print(suburb)                        #remove later
            output.append(suburb)
            break                                #stop searching for matches
        elif postcode == pc and settings=='all': #print all possible names for postcode
            print(suburb)                        #remove later
    return output 

N=[2000,2020,2120,2019]
for i in N:
    look_up_sub(i, 'random')

>>>Millers Point
>>>Mascot
>>>Westleigh
>>>Banksmeadow

While ok for small lists, when N is sufficiently large this inefficient approach is very slow. I have been thinking about how I could use numpy arrays to speed this up considerably and am looking for faster ways to approach this.

Why are you *iterating over your dictionary* to find a match? That defeeats the whole point, and you might as well have a list of tuples. Your data structure is backwards, it should go from `postcode:suburb` and then when you pass it a `pc` you get a list of suburbs back, then either select from that list randomly or print all of them in the list. — juanpa.arrivillaga, Apr 14 '17 at 00:04
Agreed! The beauty of the dictionary is O(1) lookup, iterating over it really defeats the point — Christopher Apple, Apr 14 '17 at 00:05
That definitely helped thanks `postcode_dict = dict(zip(postcode,suburb)) print(postcode_dict[2000])` — lm5050, Apr 14 '17 at 00:33

score 0 · Accepted Answer · answered Apr 14 '17 at 00:08

Your data structure is backwards, it should go from postcode:suburb and then when you pass it a pc you get a list of suburbs back, then either select from that list randomly or print all of them in the list. Here is what you should do, first, reverse your dict:

import defaultdict
post_to_burb = defaultdict(list)
for suburb, postcode in postcode_dict.items():
    post_to_burb[postcode].append(suburb)

Now, your function should do something like:

import random
def look_up_sub(pc, settings):
    output = []
    if settings == "random":
        output.append(random.choice(post_to_burb[pc]))
    elif settings == 'all':
        output.extend(post_to_burb[pc])
    return output

Using numpy here would be unweildy, especially since you are working with strings. You might get some marginal imporvemnt in runtime, but your overall algorithm will still be linear time. Now it is constant time, once you've set up your post_to_burb dict.

I did something similar and got results for a list of 47534 postcodes in about a second, thanks again. — lm5050, Apr 14 '17 at 01:09

score 0 · Answer 2 · answered Apr 14 '17 at 00:09

Build a dict from postal code to suburbs:

from collections import defaultdict
code_to_urbs = defaultdict(list)
for suburb, postcode in postcode_dict.items():
    code_to_urbs[postcode].append(suburb)

With that done, you can just write code_to_urbs[postal_code].

Faster searching in Python - Postcodes

2 Answers2