I'm currently embedding Elasticsearch as a Search-Interface into an existing application. The application is a classical 3-tier-application with a oracle SQL database.
I have the Entity 'Person' (database table), with the following attributes:
- first Name
- last Name
- full name (contains first name and last name concatenated)
- person-Nr.
- company Name
- A list of addresses with: street, zipcode, city, phone and email.
So far, I put it 1:1 into elasticsearch, for every db-column a property in elasticsearch. Synchronisation and full-load of the data is no problem. But I'm struggling providing a "good" search experience, as there are many different things to pay attention to:
- Fuzzy Search (tolerance of one or two edit distance)
- Wildcard search (if I type "Ange", it should also find results with "Angelina")
- E-Mail-Address search (I'm already using
uax_url_email
tokenizer in combination with thekeyword
datatype)
As far as I can tell, multi_match
, type cross_fields
would be a good choice, but it can't do fuzzy-search and wildcard. type best_fields
is also no option, because it can't do wildcard-search (as far as I know?). most_fields
is also not suited and phrase matching
can't do fuzziness.
Because of that, I'm currently using simple_query_string
, example:
In the search field, I enter Tom fisher
:
The query in simple_query_string
is:
(tom* | tom~1)+(fisher* | fisher~1)
My question now is, would it be a bad idea, to just have on field "entity_content", which contains the content of all fields? This would be like as I had a .txt document with all information about the person.
- What are the advantages/disadvantages?