0

I use elastic search for news articles search. If I search for "Vlamadir Putin", it works because he is in news a lot and Vlamidir and Putin are both not very popular. But if I search for "Raja Ram", it does not work. I have a few articles of "Raja Ram", but some of "Raja Mohanty" and "Ram Srivastava". These articles rank higher than articles quoting "Raja Ram". Is there something wrong in my tokenizer or search functions?

    es.indices.create(
            index="article-index",
            body={
                    'settings': {
                            'analysis': {
                                    'analyzer': {
                                            'my_ngram_analyzer' : {
                                                    'tokenizer' : 'my_ngram_tokenizer'
                                            }
                                    },
                                    'tokenizer' : {
                                            'my_ngram_tokenizer' : {
                                                    'type' : 'nGram',
                                                    'min_gram' : '1',
                                                    'max_gram' : '50'
                                            }
                                    }
                            }
                    }
            },
            # ignore already existing index
            ignore=400
    )

res = es.search(index="article-index", fields="url", body={"query": {"query_string": {"query": keywordstr, "fields": ["text", "title", "tags", "domain"]}}})
cheffe
  • 9,345
  • 2
  • 46
  • 57
Pratik Poddar
  • 1,353
  • 3
  • 18
  • 36

1 Answers1

2

You can use match_phrase option of elasticsearch

But you can't mention multiple fields for search, instead use _all field

Your query would be
res = es.search(index="article-index", fields="url", body={"query": "match_phrase": {"_all":"keywordstr"}})

harsha
  • 279
  • 2
  • 8
  • It worked, but it worked only for exact matches. For search on "A B C", if "A B C" is there, that should be great, else, the search index should give more importance to "A B" and "B C", than "A" or "B" independently. How can i make that happen? – Pratik Poddar Apr 05 '14 at 19:36
  • First query for A B c using querystring and AND operator. This will give you all the three possible results. 1. Match_phrase of A B C , 2. A AND B or B AND C or C AND A. 3. A OR B OR C. From these results you can filter out each result and boost up them accordingly. – harsha Apr 07 '14 at 13:51
  • 1
    Too messy if we are dealing with 10 words :( – Pratik Poddar Apr 07 '14 at 18:58