Assume I have the following example hierarchy:
- US
- Michigan
- Detroit
- Grand Rapids
- Lansing
- Minnesota
- Grand Rapids
- Minneapolis
- St Paul
- Ohio
- Columbus
- Grand Rapids
- Sandusky
- Michigan
I see two ways that I could index a “Grand Rapids, Michigan” document with prefixed terms:
XFIRSTLEVELus
XSECONDLEVELmichigan
XTHIRDLEVELgrandrapids
or
XFIRSTLEVELus
XSECONDLEVELus_michigan
XTHIRDLEVELus_michigan_grandrapids
I’m inclined to use the second approach thinking that it will return more intuitive results. That is, a search that includes Grand Rapids, Michigan search criteria is less likely to include documents from Minnesota and Ohio.
However, two aspects of this approach bother me. First, the creation and maintenance of term prefixes for each level of the hierarchy feels wrong. Second, the concatenation of values seems like a surrogate for using weights.
So, what is the best way to represent a hierarchy with term prefixes?