What is the leanest way to store a (big) full-text index that supports lookup of incomplete words? For example, an index lookup for colo
should return Colorado (among other things). For context, I am indexing about 60,000 geographical entities (countries, regions/states, metro areas, and cities).
In my first attempt I indexed all substrings in a word starting with the first character from two characters in length up to the full word. For example, for the word "Colorado", I created the following index entries:
co
col
colo
color
colora
colorad
colorado
But that resulted in 160,000 index entries. I'm trying to reduce this down to something more reasonable while retaining the ability to match on incomplete words and keeping the number of index entries from blowing up. What optimizations should I consider to make the index smaller?