0

I have two file st.txt , sy.txt

st.txt

was
an

sy.txt

football,soccer

Setting is below

new_player_settings = {
    "settings": {
        "index": {
            "analysis": {
                "filter": {
                    "synonym_en": {
                        "type": "synonym",
                        "synonyms_path": "sy.txt"
                    },
                    "english_stop": {
                        "type": "stop",
                        "stopwords_path": "st.txt"
                    }
                },
                "analyzer": {
                    "english_analyzer": {
                        "tokenizer": "standard",
                        "filter": [
                            "english_stop",
                            "synonym_en"
                        ]
                    }
                }
            }
        }
    },
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "english_analyzer"
            },
            "description": {
                "type": "text",
                "analyzer": "english_analyzer"
            }
        }
    }
}

myd is below

abc = [
{'id':1, 'name': 'christiano ronaldo', 'description': 'football@fifa.com', 'type': 'football'},
{'id':2, 'name': 'lionel messi', 'description': 'soccer@fifa.com','type': 'soccer'},
{'id':3, 'name': 'sachin', 'description': 'was', 'type': 'cricket'}
]

DSL query is below

{
"query": {
"query_string": {
"fields": ["name^2","description^2","type^4"],
"query": "was football"
}
}}

My Output

{'took': 2,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 2, 'relation': 'eq'},
  'max_score': 3.9233165,
  'hits': [{'_index': 'newplayers',
    '_type': '_doc',
    '_id': '1',
    '_score': 3.9233165,
    '_source': {'id': 1,
     'name': 'christiano ronaldo',
     'description': 'football@fifa.com',
     'type': 'football'}},
   {'_index': 'newplayers',
    '_type': '_doc',
    '_id': '3',
    '_score': 2.345461,
    '_source': {'id': 3,
     'name': 'sachin',
     'description': 'was',
     'type': 'cricket'}}]}}

Expected out

id 3 should not present since stopword `was` present, id 2 should present because in synonym football=stopwords

Expected

{'took': 2,
 'timed_out': False,
 '_shards': {'total': 1, 'successful': 1, 'skipped': 0, 'failed': 0},
 'hits': {'total': {'value': 2, 'relation': 'eq'},
  'max_score': 2.0,
  'hits': [{'_index': 'players',
    '_type': '_doc',
    '_id': '1',
    '_score': 2.0,
    '_source': {'id': 1,
     'name': 'christiano ronaldo',
     'description': 'football@fifa.com',
     'type': 'football'}},
   {'_index': 'players',
    '_type': '_doc',
    '_id': '2',
    '_score': 2.0,
    '_source': {'id': 2,
     'name': 'lionel messi',
     'description': 'soccer@fifa.com',
     'type': 'soccer'}}]}}
sim
  • 524
  • 3
  • 14

1 Answers1

1

Maybe issue is that sy and st text files which defines your index stop and synonyms are not present in the Elasticsearch cluster, but I tried with same settings and mappings and the sample data you provided and I was able to get your expected output, as shown below.

Search query

{
    "query": {
        "query_string": {
            "fields": [
                "name^2",
                "description^2",
                "type^4"
            ],
            "query": "was football"
        }
    }
}

And search result with source JSON

"hits": [
            {
                "_index": "72796944",
                "_type": "_doc",
                "_id": "1",
                "_score": 0.4051987,
                "_source": {
                    "name": "christiano ronaldo",
                    "description": "football@fifa.com"
                }
            },
            {
                "_index": "72796944",
                "_type": "_doc",
                "_id": "2",
                "_score": 0.4051987,
                "_source": {
                    "name": "lionel messi",
                    "description": "soccer@fifa.com"
                }
            }
        ]

Would be great if you can share the output of explain API, which you can get by appending the ?explain=true in your search endpoint, to debug further

Update: As discussed in the comment,issue is not happening when these words are defined in the setting itself, so its issue is that file content is not being updated properly in Elasticsearch.

Amit
  • 30,756
  • 6
  • 57
  • 88
  • can i have different file in index right? or in entire cluster i should not have st.txt only one time i could use st.txt – sim Jul 11 '22 at 07:53
  • @sim, sorry didn't get you, but instead of file you can have these words as these are less, as part of your index setting itself, adding the example of it in my answer – Amit Jul 11 '22 at 07:54
  • can you explain this `Maybe issue is that sy and st text files which defines your index stop and synonyms are not present in the Elasticsearch cluster,` – sim Jul 11 '22 at 07:56
  • 1
    I got output if i am adding directly to the settings, but my file size is huge, that is why i m adding files – sim Jul 11 '22 at 07:57
  • @sim, ahh i see, so it means issue is the file, how are you adding these files to Elasticsearch cluster? – Amit Jul 11 '22 at 07:58
  • its from python api – sim Jul 11 '22 at 07:58
  • @sim, no you need to manually store these files in the Elasticsearch nodes – Amit Jul 11 '22 at 07:59
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/246336/discussion-between-amit-and-sim). – Amit Jul 11 '22 at 08:00