0

Is there a way to get facet counts based on a substring of a facet field, akin to an EdgeNGram?


I'm using solr to store geohash strings at a high precision, and want to count the number of documents at a certain geohash precision. Facets are used to count documents in a specific geohash 'cell'.

At the moment, the only way I can see to do this is using tiers of geohashes.

eg. Current facet result set (from the indexed data):

<lst name="facet_counts">
 <lst name="facet_fields">
  <int name="svztdm7w">11</int>
  <int name="sv87rzt8">3</int>
  <int name="sv83t6bf">2</int>
  <int name="syqxp43m">4</int>
  <int name="syr9f0v2">4</int>
  <int name="syp8p8hb">3</int>
  <int name="tuuttmtt">3</int>
  <int name="twj1ynm3">3</int>
  <int name="w30n6u71">3</int>
 </lst>
</lst>

What I want at precision 1 setting:

<int name="s">27</int>
<int name="t">6</int>
<int name="w">3</int>

What I want at precision 2 setting:

<int name="sv">16</int>
<int name="sy">11</int>
<int name="tu">3</int>
<int name="tw">3</int>
<int name="w3">3</int>

Cheers.

Sensai
  • 3
  • 1

1 Answers1

0

I've done a lot of work with geohashes in Solr; my latest work is LSP: http://code.google.com/p/lucene-spatial-playground/ which has various indexing strategies, including geohashes. If you search for my name and geohash, you'll find various material.

It sounds like what you are after is essentially a geohash based heatmap. That is something on my TODO list for LSP but in the mean time you can get it with a little manipulation of how you index the geohashes. After edge n-gramming your geohash, prefix the geohash with a leading number that is the length of the geohash. For example, instead of just "16", index "216". Use hexadecimal notation so you can get 16 values in one character, instead of decimal's 10. When faceting, use facet.prefix=2.

Good luck and keep in touch.

David Smiley
  • 4,102
  • 2
  • 19
  • 18
  • Yes a heatmap - got it in one! That works great. It means my indexing is a little clunky (I used a separate regex matches to do each of the prefixes - not sure if this is the best way??). On that note, is there a simple way to get solr to calc the geohash and make it available for faceting or must it be supplied? – Sensai Jan 12 '12 at 00:46
  • Solr has built-in geohashing with the GeoHashField but it's doesn't have the length prefix. I don't see how or why you used regexes; simply compute the length of the string and prepend it in hex. I would put all this logic into an UpdateRequestProcessor and index the field as a String. – David Smiley Jan 12 '12 at 16:06