0

I have a bunch of domain names without the tld I'd like to search but they don't always have a natural break in between words (like a "-"). For instance:

techtarget
americanexpress
theamericanexpress // a non-existent site
thefacebook

What is the best analyzer to use? e.g. if a user types in "american ex" I'd like to prioritize "americanexpress" over "theamericanexpress". A simple prefix query would work in this particular case but a user then types in "facebook" but that doesn't return anything. ;(

1 Answers1

0

In most of the case including yours, Standard Analyzer is sufficient. Also, it is default analyzer in ElasticSearch and it provides grammar based tokenization. For example: "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone." will be tokenized into [ the, 2, quick, brown, foxes, jumped, over, the, lazy, dog's, bone ].

In your case, domain names are tokenized into list of terms as [techtarget, americanexpress, theamericanexpress, thefacebook].

Why query search for facebook doesnot return anything?

Because, there is no facebook term stored in the dictionary and hence search result return no data. Whats going on is that ES try to find search term facebook in the dictionary but the dictionary only contain thefacebook and hence search return no result.

Solution:

In order to match search term facebook with thefacebook, you need to wrap wildcards around your search term i.e. .*facebook will match thefacebook. However, you should know that using regex will have a performance overheads.

Other workaround is that you can use synonyms. What synonyms does is that you can specify synonyms (list of alternative search terms) for your search terms. e.g. "facebook, thefacebook, facebooksocial, fb, fbook", with these synonyms, you can provide any of search term from these synonyms, the it will match with any of these synonyms. i.e. If your search term is facebook and your domain is stored as thefacebook then the search will be matched.

Also, for prioritization you need to first understand how scoring work in ES and then you can use Boosting.

Ra Ka
  • 2,995
  • 3
  • 23
  • 31