analyzed v not_analyzed or ...?

Question

New to ES so maybe a dumb question but I am trying to search using a wildcard, e.g.: "SOMECODE*" and "*SOMECODE"

It works fine, but the value in the document may have "SOMECODE/FRED".
The problem is * will match with anything (which includes nothing).
*SOMECODE will get a hit on SOMECODE/FRED.

I tried searching for */SOMECODE but this returns nothing.
I think the tokenization of the field is the root problem.
i.e., / causes the value to be 2 words.

I tried setting the map on the field to not_analyzed, but then I cant search on it at all.

Am I doing it wrong?

Thanks

You should be able to search on the field anyway, unless it is indexed=false in your mapping. — javanna, Jan 30 '13 at 09:40
It definately didnt work with not_analysed but I am using Queries rather than Filters. Maybe thats why? Far out this is complicated! — Jonesie, Jan 30 '13 at 20:29

score 14 · Accepted Answer · edited Mar 11 '16 at 22:46

14

By setting not_analyzed, you are only allowing exact matches (e.g. "SOMECODE/FRED" only, including case and special characters).

My guess is that you are using the standard analyzer (It is the default analyzer if you don't specify one). If that's the case, Standard will treat slashes as a token separator, and generate two tokens [somecode] and [fred]:

$ curl -XGET 'localhost:9200/_analyze?analyzer=standard&pretty' -d 'SOMECODE/FRED'
{
    "tokens" : [ {
    "token" : "somecode",
    "start_offset" : 0,
    "end_offset" : 8,
    "type" : "<ALPHANUM>",
    "position" : 1
  }, {
    "token" : "fred",
    "start_offset" : 9,
    "end_offset" : 13,
    "type" : "<ALPHANUM>",
    "position" : 2
  } ]
}

If you don't want this behavior, you need to change to a tokenizer that doesn't split on special characters. However, I would question the use-case for this. Generally, you'll want to split those types of characters.

edited Mar 11 '16 at 22:46

Aminah Nuraini

18,120
8
90
108

answered Jan 30 '13 at 20:56

Zach

9,591
1
38
33

Thanks, this makes perfect sense now. Perhaps you should be writing the documentation for ES :) I just need to figure out how to set the tokenizer for a single field. I guess I do this in the map? – Jonesie Jan 31 '13 at 01:27
1

Yep! You can set the mapping of a field either when you create the index (http://www.elasticsearch.org/guide/reference/api/admin-indices-create-index.html), or after the index is created with Put Mapping API (http://www.elasticsearch.org/guide/reference/api/admin-indices-put-mapping.html). You may have to delete your data or make a new index...ES doesn't allow you to alter the mapping of an existing field. – Zach Jan 31 '13 at 12:46
I've set the field in my index map to not_analyzed but it still wont find the values I want. – Jonesie Feb 14 '13 at 01:59
Are you search with exact case? Maybe put together a gist of the entire process ( index creation, mapping, data, etc)? It's a lot easier to help debug if you have all the steps required to debug it. – Zach Feb 14 '13 at 11:15
I managed to solve this by talking about it (http://stackoverflow.com/questions/14866727/elasticsearch-and-not-analyzed-still-cant-find-stuff/14867293#14867293) – Jonesie Feb 14 '13 at 18:38
And sorry I dont gist, but I may do a blog post about the whole thing at some point. – Jonesie Feb 14 '13 at 18:39
Now you can not analyze and still do a prefix search. see https://www.elastic.co/guide/en/elasticsearch/guide/current/_postcodes_and_structured_data.html – Jonathan Hendler Jul 21 '17 at 01:11

analyzed v not_analyzed or ...?

1 Answers1