3

To enhance my search result obtained from elastic search I want to increase my stop word library from my java code. Till now , I am using the default list of stop analyzer which do not have the interrogative words in list like What,Who,Why etc. We want to remove these words and some additional words from our search when querying for result. I have tried code from here(the last ans) tried

PUT /my_index
{
"settings": {
"analysis": {
  "analyzer": {
    "my_analyzer": { 
      "type": "standard", 
      "stopwords": [ "and", "the" ] 
    }
  }
}

} }

This code in java. But It wasn' working for me. Important Query

How to create our own list of stopwords and how to implement it in our code with query

QueryStringQueryBuilder qb=new QueryStringQueryBuilder(text).analyzer("stop");
            qb.field("question_title");
            qb.field("level");
            qb.field("category");
            qb.field("question_tags");
            SearchResponse response = client.prepareSearch("questionindex")
            .setSearchType(SearchType.QUERY_AND_FETCH)
            .setQuery(qb)
            .execute()
            .actionGet();
            SearchHit[] results = response.getHits().getHits();
            System.out.println("respose-"+results.length);

Currently I am using default stop analyzer. Which just stop a limited stop words like

"a", "an", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"

But I want to increase this library.

Community
  • 1
  • 1
Testing Test
  • 91
  • 2
  • 6
  • 1
    You need to add StopFilter for adding custom stopwords. Could you share your java code which you have developed so far ? – Utkarsh Jul 25 '15 at 13:20

1 Answers1

1

You're on the right track. In your first listing (from the documentation about stopwords) you created a custom analyzer called my_analyzer for the index called my_index which will have the effect of removes "and" and "the" from text that you use my_analyzer with.

Now to actually use it, you should:

  1. Make sure that you've defined my_analyzer on the index you're querying (questionindex?)
  2. Create a mapping for your documents that uses my_analyzer for the fields where you would like to remove "and" and "the" (for example the question_title field):
  3. Test out your analyzer using the Analyze API

    GET /questionindex/_analyze?field=question.question_title&text=No quick brown fox jumps over my lazy dog and the indolent cat

  4. Reindex your documents


Try this as a starting point:

POST /questionindex
{
    "settings" : {
        "analysis": {
            "analyzer": {
                "my_analyzer": { 
                    "type": "standard", 
                    "stopwords": [ "and", "the" ] 
                }
            }
        }
    },
    "mappings" : {
        "question" : {
            "properties" : {
                "question_title" : { 
                    "type" : "string", 
                    "analyzer" : "my_analyzer" 
                },
                "level" : { 
                    "type" : "integer" 
                },
                "category" : { 
                    "type" : "string" 
                },
                "question_tags" : { 
                    "type" : "string" 
                }
            }
        }
    }
}
Peter Dixon-Moses
  • 3,169
  • 14
  • 18