2

I facing some interesting behavior while searching email addresses with Query String Filter:

.filteredQuery(
   queryStringQuery(String.format("*%s*", query))
       .field("firstName").field("lastName").field("email").field("phone"),
   null
)

if I pass domain.com as a query (assuming there is such value in the index) - results are fine, but once I pass @domain.com - results are empty.. Are there some limitations for special symbols?

nKognito
  • 6,297
  • 17
  • 77
  • 138

1 Answers1

1

If you set to true analyze_wildcard it should work. By default, query string doesn't analyze those tokens that contain wildcard. if you set that option to true elasticsearch will try. This option is not perfect as doc says:

By setting this value to true, a best effort will be made to analyze those as well.

The reason behind your empty result is that the default analyzer is removing the @ and when searching *@domain.com* and analyze_wildcard is false, the @ is not being removed at query time.

Code will look like:

.filteredQuery(
    queryStringQuery(String.format("*%s*", query)).analyzeWildcard(true)
        .field("firstName").field("lastName").field("email").field("phone"),
    null
)

EDIT: Better explanation of why you get empty result.

First of all, analyzers can be executed at index(you set this in your mapping) time and at query time (not all query execute the analyzer at query time)

In your case, at index time standard analyzer is analyzing field email as follows:

name@domain.com => it's being indexed name and domain.com

This means that your document will contain two tokens name and domain.com. If you tried to find exact term "name@domain.com" you wouldn't find anything because your document no longer contains the full email.

Now at query time you are doing a query string *@domain.com*. By default query string doesn't analyze those tokens that contain wildcards, so you are trying to find tokens that contain @domain.com that it not the case of your index.

Now if you set property analyze_wildcard to true. Elasticsearch analyzes those tokens with wildcard so your query would be transformed into *domain.com* and in this case you have documents that match.

moliware
  • 10,160
  • 3
  • 37
  • 47
  • Thanks for reply, but I am kind of confused.. You said that by default query string with wildcard is not analyzed, but after that you say that default analyzer removes the `@ ` which means that query string tokes being yes analyzed.. Next, you say that when `analyze_wildcard` is false, then the `@` is not being removed and that's why results are empty.. If I understood you right,they empty because indexed field was analyzed and all the special characters were removed, right? – nKognito Jul 29 '15 at 11:55
  • @nKognito I edited the answer with a better explanation. Hope it helps. – moliware Jul 30 '15 at 07:31