1

I want to do a super fast geocode lookup, returning co-ordinates for an input of Town, City or Country. My knowledge is basic but from what I understand writing it in C is a good start. I was thinking it makes sense to have a tree structure like this:

  • England
    • Kent
    • Orpington
    • Chatam
    • Rochester
    • Dover
    • Edenbridge
  • Wiltshire
    • Swindon
    • Malmsbury

In my file / database I will have the co-ordinate and the town/city name. If give my program the name "Kent" I want a program that can return me the co-ordinate assoaited with "Kent" in the fastest way possible

Should I store the data in a binary file or a SQL database for performance reasons? What is the best method of searching this data? Perhaps binary tree searching? How should the data be stored? perhaps?

J.Zil
  • 2,397
  • 7
  • 44
  • 78
  • 6
    Geocoding and autocomplete have nothing to do with each-other. – SLaks Jul 30 '12 at 13:20
  • 2
    Why do you think C++ will help? C++ code might be faster by a miniscule amount, but this will be far outweighed by the time spent querying your file/database/whatever, and the complexity of development in C++ will be a big overhead if you've no experience. – Dan Puzey Jul 30 '12 at 13:22
  • I feel like im in the lions den here. I edited the original post to reflect that what I am interested in is returning co-ordinates for the places it finds. I added c, c# and c++ because when I said I want it coded in C, I mean encompasing all of these. I have no experience coding in C, C# and C++ so perhaps one is better suited to this. – J.Zil Jul 30 '12 at 13:26
  • 1
    Geocoding is bound to be a *huge* bottleneck. I don't think it matters very much how fast your code is (it will always be order of magnitude *too fast* compared to geocoding itself) – Alex Jul 30 '12 at 13:34
  • Programming language is really not the main concern here. You need to figure out what exactly you are trying to do and how you are going to do it first. Once you have a clear plan what data you want to store how and how you are going to search in it etc, then you can think about which language you'll use to implement it. The best choice would be a language you are familiar with. – sth Jul 30 '12 at 13:35
  • Ok, thank you for the input. I havent been too clear on my question. In my file / database I will have the co-ordinate and the town/city name. If give my program the name "Kent" I want a program that can return me the co-ordinate assoaited with "Kent" in the fastest way possible. – J.Zil Jul 30 '12 at 13:40
  • 1
    @JamesWillson: That right there made perfect sense. Edit that into your question. – Linuxios Jul 30 '12 at 13:40
  • 1
    I suggest using a database. Let the database worry about data structures and fast retrieval methods. That's what they are designed for. – Thomas Matthews Jul 30 '12 at 14:28
  • Are you recommending a database like SQLite or a non-relational one like Mongo. Also, would the speed not be inferior? – J.Zil Jul 30 '12 at 14:38
  • I don't know, depends on your situation. At a GPS shop that I worked at, they used a spatial database. You may want to investigate that. – Thomas Matthews Jul 30 '12 at 23:58

3 Answers3

4

Here's a little advice, but not much more than that:

If you want to find places by name, or name prefix, as you indicate that you wish to, then you would be ill-advised to set up a data structure which stores the data in a hierarchy of country, region, town as you suggest you might. If you have an operation that dominates the use of your data structure you are generally best picking the data structure to suit the operation.

In this case an alphabetical list of places would be more suited to your queries. To each place not at the topmost level you would want to add some kind of reference to the name of its 'parent'. If you have an alphabetical list of places you might also want to consider an index , perhaps one which points directly to the first place in the list which starts with each letter of the alphabet.

As you describe your problem it seems to have much more in common with storing words in a dictionary (I mean the sort of thing in which you look up words rather than any particular collection data-type in any specific programming language which goes under the same name) than with most of what goes under the guise of geo-coding.

My guess would be that a gazetteer including the names of all the world's towns, cities, regions and countries (and their coordinates) which have a population over, say, 1000, could be stored in a very simple data structure (basically a list) with an index or two for rapid location of the first A place-name, the first B, and so on. With a little compression you could probably hold this in the memory of most modern desktop PCs.

High Performance Mark
  • 77,191
  • 7
  • 105
  • 161
  • You can have the records in any order you wish. You should create one or more *indices* which are associative arrays containing the key (town name) and value (pointers to other information). This way you can access data fast without worrying about the organization of the records. See also database theory. – Thomas Matthews Jul 30 '12 at 14:26
  • @ThomasMatthews: I think that you should repost your 'comment' as an answer since it offers advice quite different from my own. – High Performance Mark Jul 30 '12 at 14:33
  • Yes would you be willing to explain this a little more please? I am reading up on indices now. – J.Zil Jul 30 '12 at 14:57
1

I think the best advice I can give is to use whatever language you are familiar with to get the results you want. Worry about performance once your code works. Then you can look at translating very specific pieces of functionality into C or C++ one at a time until you have the results you want.

Robert H
  • 11,520
  • 18
  • 68
  • 110
1

You should not worry about how the information is stored, except not to duplicate data.

You should create one or more indices for the data. The indicies are associative arrays / maps data structures that contain a key (the item you want to search) and a value (such as the record and other information associated with the key). This will enable you with fast lookups without altering your data for each type of search.

On the other hand, your case is an excellent fit for a data base. I suggest you let the database manager your data (such as efficient lookups). After all, that is what they live for.

See also: At what point is it worth using a database?

Community
  • 1
  • 1
Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154