9

I'm trying to set up an ElasticSearch index with different analyzers for the individual fields. However, I can't seem to find a way to set field-specific analyzers; here's how I create my (test) index:

curl -XPOST localhost:9200/twitter
curl -XPUT 'http://localhost:9200/twitter/tweet/_mapping' -d '
{
    "tweet" : {
        "properties" : {
            "message" : {
                "type" : "string",
                "search_analyzer" : "snowball", 
                "index_analyzer" : "snowball"
            }
        }
    }
}'

If I read the documentation correctly, then this should create the index 'twitter' with the type 'tweet', and content for the 'message' field should be analyzed through the snowball stemming analyzer. To test for this, I tried the following queries:

curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{
    "message" : "Look, a fighting War-Unicorn!"
}'
curl -XGET localhost:9200/twitter/_search?q=fight

If I'm not mistaken, then this should return a hit, as fight is the stem for fighting; the problem is, it doesn't, I'm getting zero hits. It appears as if ElasticSearch ignores the mapping entirely (even though ElasticSearch accepts all of these queries, as I get 'ok' back for each of them.)

I've already tried replacing the default analyzer with a snowball analyzer, and then it works; thing is, I totally need to have field-specific analyzers, so this isn't going to help me. I also tried different analyzers and things like setting "index" to "no", but to no avail.

What am I doing wrong?

Felix
  • 367
  • 1
  • 5
  • 7

2 Answers2

11

To use a field-specific analyzer you need to specify this field in the query. Otherwise, default analyzer is used. Try

curl -XGET 'localhost:9200/twitter/_search?q=message:fight'

or

curl -XGET 'localhost:9200/twitter/_search?df=message&q=looking'
imotov
  • 28,277
  • 3
  • 90
  • 82
  • 1
    ok, that's for the search analyzer .. but shouldn't the snowball index_analyzer reduce "fighting" to "fight", as it does when running the sentence directly through the analyze API? In that case, searching for "fight" would return a hit, regardless of the search analyzer used, wouldn't it? And, more importantly, what if I don't know yet which field to search in? I only have one here, alright, but I'll need to have several in the end, and search them. – Felix Jun 03 '11 at 17:16
  • 1
    snowball analyzer reduces "fighting" to "fight" in the "message" field indeed. However, if you don't specify a field in your search, you are searching special "_all" field that indexes the content of the "message" field (and all other fields if you had any) but this content is analyzed using default analyzer. – imotov Jun 03 '11 at 17:41
  • ok, that i understand - thanks for explaining, @imotov! still ... if the index analyzer reduces "fighting" to "fight", wouldn't "fight" be the token that ES saves and indexes and checks against search queries? thus, wouldn't searching for "fight" using the standard analyzer return a match nonetheless, if the text was indexed using the snowball analyzer? – Felix Jun 03 '11 at 22:45
  • 5
    When your message is indexed, it's indexed as a document with a few fields: `{message:["look","fight","war","unicorn"], _all:["look","a","fighting","war","unicorn"],_type:"tweet",...}`. Without search field specified, your query is translated into `_all:fight`. Please notice that because field `_all` was analyzed using default analyzer, it **doesn't** contain token `fight` and that's why it doesn't show up in the results. In other words your message is indexed twice using two different analyzers (default and snowball) and you are searching the version analyzed by default analyzer. – imotov Jun 04 '11 at 02:18
  • Precisely. Note, that you can also set the analyzer for this special `_all` field, in your mapping. Then you don't have to set neither default field nor analyzer in your query. – karmi Jun 04 '11 at 06:24
  • That makes sense, thanks for explaining! @imotov: one more question, is there a query to get the JSON you in your comment? – Felix Jun 06 '11 at 07:32
  • 1
    Not really. This "JSON" is a simplification of what's actually indexed. But you can get pieces of it by calling `curl -XGET 'localhost:9200/twitter/_analyze?analyzer=default&pretty=true' -d 'Look, a fighting War-Unicorn!'` and `curl -XGET 'localhost:9200/twitter/_analyze?analyzer=snowball&pretty=true' -d 'Look, a fighting War-Unicorn!'` – imotov Jun 06 '11 at 13:47
0

I recommend to you use https://github.com/lmenezes/elasticsearch-kopf can test all analyzers and copy mapp from another index and monitor your indexes... http://www.elasticsearch.org/guide/en/elasticsearch/client/community/current/health.html

AhmedAlawady
  • 77
  • 1
  • 3