Questions tagged [elasticsearch-analyzers]
145 questions
0
votes
1 answer
Adding exclusions to multi word synonyms in ElasticSearch
I have the following synonyms (just for this example)
"synonyms": {
"type": "synonym_graph",
"expand": true,
"lenient": true,
"tokenizer": "standard",
"synonyms": [
"french => french, ethnicity",
"toast => toast, cheese sandwich"
]}
What…

Lior Magen
- 1,533
- 2
- 15
- 33
0
votes
2 answers
Prioritizing original terms over synonyms in Elastic Search
In my analyzer pipeline I have a synonyms function.
Let's say that I have the following synonyms
beverage, drink
Now let's say that a user searches for 'beverage', the user will get documents that contain 'beverage' or 'drink' without any…

Lior Magen
- 1,533
- 2
- 15
- 33
0
votes
1 answer
what types are best for elasticsearch "KEYWORDS"(like hashtags) field?
i want to make Elasticsearch index for something KEYWORDS, like.. hashtag.
and make synonym filter for KEYWORDs.
i think two ways indexing keyword, first is make keyword type.
{
"settings": {
"keywordField": {
"type":…

JYL
- 193
- 2
- 16
0
votes
1 answer
Python Elasticsearch: Errors when trying to apply an analyzer to Index documents
So I'm trying to apply an analyzer to my index but no matter what I do I get some sort of error. I've been looking stuff up all day but can't get it to work. If I run it as it is below, I get an error which says
…

L M
- 65
- 1
- 7
0
votes
0 answers
Elastic Search and product nomenclature: hyphens and spaces
I'm having a hard time figuring out how to set up Elasticsearch for the typical product model nomenclature. For instance, a product called "Shure SM7B" should appear as a result when searching for SM7B, SM 7B, SM 7, SM-7... and vice versa: searching…

Xavier
- 1
- 2
0
votes
1 answer
elasticsearch ignore accents on search
I have an elasticsearch index with customer informations
I have some issues looking for some results with accents
for example, I have {name: 'anais'} and {name: anaïs}
Running
GET /my-index/_search
{
"size": 25,
"query": {
"match": {"name":…

Ajouve
- 9,735
- 26
- 90
- 137
0
votes
1 answer
Strange tokenization in Lucene 8 Brazilian Portuguese analyzers
I'm using Lucene 8.6.2 (currently the latest available) with AdoptOpenJDK 11 on Windows 10, and I'm having odd problems with the Portuguese and Brazilian Portuguese analyzers mangling the tokenization.
Let's take a simple example: the first line of…

Garret Wilson
- 18,219
- 30
- 144
- 272
0
votes
1 answer
How to update the tokens of standard tokenizer
I am using the standard tokenizer in my elasticsearch plugin. I need to iterate each token of standard tokenizer and update with some encrypted text to the lucene index. Is there any way to update the tokens of standard tokenizer? Can anyone help?

Brisi
- 1,781
- 7
- 26
- 41
0
votes
1 answer
Using different language analyzers with ngram Analyzer in one mapping in Elasticsearch
i want to use english and german custom analyzers together with other analyzers for example ngram. Is the following mapping correct? i am getting error for german analyzer. [unknown setting [index.filter.german_stop.type]. i searched but i did not…

yolo25
- 45
- 6
0
votes
1 answer
adding analyser in ElasticSearch field of type array
I have an elastic search object in which one field is an array type. now i want to apply a different analyser than standard default one. when i pass analyzer in index definition, it is throwing error. how can i do this?
In the below example, skills…

Rajeev
- 4,762
- 8
- 41
- 63
0
votes
1 answer
"asciifolding" with tokenizer "pattern" in elasticsearch
can anyone tell me why "asciifolding" doesn't work on the "pattern" tokenizer in my mapping below?
I need to use the "pattern" tokenizer but I also need to not differentiate words with an accent or without an accent function that "asciifolding"…

Jean
- 47
- 6
0
votes
2 answers
Searchkick stemming
Using searchkick and see that a search for "animals" is returning results for "anime" because of their stem "anim". Does anyone have any suggestions on how to improve these results?
I see the in docs you can do something like
exclude_queries = {
…

user2031423
- 347
- 4
- 7
0
votes
0 answers
elasticsearch combining muti-word token to single token
Basically, Let's say I have a list phrases in vocabulary
- University of Texas Dallas
- University of Tokyo
- University of Toronto
Let's say I have 3 documents
- doc1: I study at University of Texas Dallas and its awsome
- doc2: I study at…

Kaushik J
- 962
- 7
- 17
0
votes
1 answer
Elasticsearch "keep_types" filter doesn't work with "pattern" tokenizer
I am having a problem using the "keep_types" filter with a "pattern" tokenizer, here is an example:
{
"tokenizer": {
"type": "pattern",
"pattern": "[()., _-]"
},
"filter": [
…
0
votes
1 answer
Elasticsearch pattern tokenizer to exclude commas and
tags
element field is indexed with comma-separated values such as dog,cat,mouse. I am using this analyzer to split the above value in 3 elements dog, cat and mouse
ES config
"settings": {
"analysis": {
"analyzer": {
…

shAkur
- 944
- 2
- 21
- 45