Questions tagged [elasticsearch-analyzers]

145 questions
0
votes
1 answer

Adding exclusions to multi word synonyms in ElasticSearch

I have the following synonyms (just for this example) "synonyms": { "type": "synonym_graph", "expand": true, "lenient": true, "tokenizer": "standard", "synonyms": [ "french => french, ethnicity", "toast => toast, cheese sandwich" ]} What…
Lior Magen
  • 1,533
  • 2
  • 15
  • 33
0
votes
2 answers

Prioritizing original terms over synonyms in Elastic Search

In my analyzer pipeline I have a synonyms function. Let's say that I have the following synonyms beverage, drink Now let's say that a user searches for 'beverage', the user will get documents that contain 'beverage' or 'drink' without any…
Lior Magen
  • 1,533
  • 2
  • 15
  • 33
0
votes
1 answer

what types are best for elasticsearch "KEYWORDS"(like hashtags) field?

i want to make Elasticsearch index for something KEYWORDS, like.. hashtag. and make synonym filter for KEYWORDs. i think two ways indexing keyword, first is make keyword type. { "settings": { "keywordField": { "type":…
JYL
  • 193
  • 2
  • 16
0
votes
1 answer

Python Elasticsearch: Errors when trying to apply an analyzer to Index documents

So I'm trying to apply an analyzer to my index but no matter what I do I get some sort of error. I've been looking stuff up all day but can't get it to work. If I run it as it is below, I get an error which says …
L M
  • 65
  • 1
  • 7
0
votes
0 answers

Elastic Search and product nomenclature: hyphens and spaces

I'm having a hard time figuring out how to set up Elasticsearch for the typical product model nomenclature. For instance, a product called "Shure SM7B" should appear as a result when searching for SM7B, SM 7B, SM 7, SM-7... and vice versa: searching…
Xavier
  • 1
  • 2
0
votes
1 answer

elasticsearch ignore accents on search

I have an elasticsearch index with customer informations I have some issues looking for some results with accents for example, I have {name: 'anais'} and {name: anaïs} Running GET /my-index/_search { "size": 25, "query": { "match": {"name":…
Ajouve
  • 9,735
  • 26
  • 90
  • 137
0
votes
1 answer

Strange tokenization in Lucene 8 Brazilian Portuguese analyzers

I'm using Lucene 8.6.2 (currently the latest available) with AdoptOpenJDK 11 on Windows 10, and I'm having odd problems with the Portuguese and Brazilian Portuguese analyzers mangling the tokenization. Let's take a simple example: the first line of…
Garret Wilson
  • 18,219
  • 30
  • 144
  • 272
0
votes
1 answer

How to update the tokens of standard tokenizer

I am using the standard tokenizer in my elasticsearch plugin. I need to iterate each token of standard tokenizer and update with some encrypted text to the lucene index. Is there any way to update the tokens of standard tokenizer? Can anyone help?
Brisi
  • 1,781
  • 7
  • 26
  • 41
0
votes
1 answer

Using different language analyzers with ngram Analyzer in one mapping in Elasticsearch

i want to use english and german custom analyzers together with other analyzers for example ngram. Is the following mapping correct? i am getting error for german analyzer. [unknown setting [index.filter.german_stop.type]. i searched but i did not…
yolo25
  • 45
  • 6
0
votes
1 answer

adding analyser in ElasticSearch field of type array

I have an elastic search object in which one field is an array type. now i want to apply a different analyser than standard default one. when i pass analyzer in index definition, it is throwing error. how can i do this? In the below example, skills…
Rajeev
  • 4,762
  • 8
  • 41
  • 63
0
votes
1 answer

"asciifolding" with tokenizer "pattern" in elasticsearch

can anyone tell me why "asciifolding" doesn't work on the "pattern" tokenizer in my mapping below? I need to use the "pattern" tokenizer but I also need to not differentiate words with an accent or without an accent function that "asciifolding"…
Jean
  • 47
  • 6
0
votes
2 answers

Searchkick stemming

Using searchkick and see that a search for "animals" is returning results for "anime" because of their stem "anim". Does anyone have any suggestions on how to improve these results? I see the in docs you can do something like exclude_queries = { …
0
votes
0 answers

elasticsearch combining muti-word token to single token

Basically, Let's say I have a list phrases in vocabulary - University of Texas Dallas - University of Tokyo - University of Toronto Let's say I have 3 documents - doc1: I study at University of Texas Dallas and its awsome - doc2: I study at…
Kaushik J
  • 962
  • 7
  • 17
0
votes
1 answer

Elasticsearch "keep_types" filter doesn't work with "pattern" tokenizer

I am having a problem using the "keep_types" filter with a "pattern" tokenizer, here is an example: { "tokenizer": { "type": "pattern", "pattern": "[()., _-]" }, "filter": [ …
0
votes
1 answer

Elasticsearch pattern tokenizer to exclude commas and
tags

element field is indexed with comma-separated values such as dog,cat,mouse. I am using this analyzer to split the above value in 3 elements dog, cat and mouse ES config "settings": { "analysis": { "analyzer": { …