Banana Dashboard For Solr Not Tokenizing Location Names Correctly

Question

I am using banana dashboard for generating a non time series dashboard for my solr indexed data. The "location" field in the indexed data doesn't display correctly in the banana dashboard facets widget with names like "San Francisco", "New York" being shown as "San" and "Francisco" and "New" and "York".

However when I cross check my Solr Query results these fields are correctly shown as a single entity "San Francisco" and "New York".

In the Solr core the managed-schema.xml file has the below entries:

<field name="content" type="opennlp-en-tokenization" indexed="true" stored="true" multiValued="true"/>
<field name="person" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="organization" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="location" type="text_general" indexed="true" stored="true" multiValued="true"/>

Any idea where I might be going wrong?

Banana Dashboard With Loc Names having Space Wrongly Tokenized As Two Different Places

Solr Dashboard With Loc Names Having Space Correctly Shown As One Single Location

That’s probably because your field is indexed with some tokenizer? While solr query results showing stored values? — Mysterion, Jan 18 '19 at 11:47
@Mysterion I have updated the Qs with the configuration details for the core. I am using the OpenNLPTokenizer. But isn't the same configurations also valid for tokenizing the indexes at Solr too. Solr refers to the entries in this managed_schema.xml of the core to index and store. — raikumardipak, Jan 21 '19 at 06:08
Your `location` field has `text_general` as its tokenizer. That will split the input into multiple tokens, ending up with the result you're showing. Change it to a `string` field or use a `KeywordTokenizer` (if you need to process it in any way). If you want to still be able to use the field for searching without having to have an exact match, define another field as the string field and facet on that, and use `copyField` to copy the content into both fields. — MatsLindh, Jan 21 '19 at 09:52
@MatsLindh Changing it to string worked! However as in the screenshot above, Solr was correctly showing the location attribute as "New York" instead of "New" and "York" even when the field type was "text_general" . What I am missing here? — raikumardipak, Jan 21 '19 at 10:51
What's being shown is not the same as what's stored as tokens behind the scenes. Faceting works on the tokens, not the stored text. — MatsLindh, Jan 21 '19 at 10:54

score 1 · Accepted Answer · answered Jan 21 '19 at 10:55

Your location field has text_general as its tokenizer. That will split the input into multiple tokens, ending up with the result you're showing.

Change it to a string field or use a KeywordTokenizer (if you need to process it in any way). If you want to still be able to use the field for searching without having to have an exact match, define another field as the string field and facet on that, and use copyField to copy the content into both fields.

The reason is that faceting uses the tokens for generating the counts, and not the stored text for the field (which is what you see when you query the document). The tokens are not directly visible (.. except when faceting or retrieving terms), but you can see how your content is processed and what tokens your input ends up as under the "Analysis" page under the Solr Admin.

Banana Dashboard For Solr Not Tokenizing Location Names Correctly

1 Answers1