I'm trying to find a solution that allows students to search jobs based on a role query.
I've managed to get exactly what I want using cross_fields, but I loose fuzzyness.
Here is my dataset for testing purposes:
POST _bulk
{"index":{"_index":"duarte-search-role","_id":"Job 1"}}
{"title":"Marine Biologist 1","overview":"Marine Biologist","opportunity_type_name":"Graduate Job","expired":false}
{"index":{"_index":"duarte-search-role","_id":"Job 2"}}
{"title":"Marine Biologist 2","overview":"No keyword","opportunity_type_name":"Graduate Job","expired":false}
{"index":{"_index":"duarte-search-role","_id":"Job 3"}}
{"title":"Comparison Job 3","overview":"Marine Biologist","opportunity_type_name":"Graduate Job","expired":false}
{"index":{"_index":"duarte-search-role","_id":"Job 4"}}
{"title":"Marine Biologist 4","overview":"Marine Biologist","opportunity_type_name":"Internship","expired":false}
{"index":{"_index":"duarte-search-role","_id":"Job 5"}}
{"title":"Marine Biologist 5","overview":"No keyword","opportunity_type_name":"Internship","expired":false}
{"index":{"_index":"duarte-search-role","_id":"Job 6"}}
{"title":"Comparison Job 6","overview":"Marine Biologist","opportunity_type_name":"Internship","expired":false}
I want to search across all fields using fuzzyness.
For example if someone types "Marine Biologist"
- Job 1 comes first because it has the word in both title and overview
- Job 5 comes after for the same reason
- Job 2 comes after because it has the word in the title
- etc
If someone searches for "Graduate Marine Biologist"
- Job 1 comes first because it has the word "Marine Biologist" in both title and overview and it has "Graduate" in the opportunity type.
- Job 2 comes second because it has the word "Marine Biologist" in the title and "Graduate" in the opportunity type.
- etc
If someone searches for "Marine Biologist Internship"
- Job 4 comes first because it has the word "Marine Biologist" in both title and overview and it has "Internship" in the opportunity type.
- Job 5 comes second because it has the word "Marine Biologist" in the title and "Internship" in the opportunity type.
- etc
I can achieve perfect results like the above using this
GET /search-role/_search?search_type=dfs_query_then_fetch
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "Marine Biologist Internship",
"fields": [
"title^100",
"overview^50",
"opportunity_type_name^30"
],
"operator": "and",
"type": "cross_fields",
"tie_breaker": 1
}
}
],
"filter": [
{
"term": {
"expired": false
}
}
]
}
},
"sort": [
{
"_score": {
"order": "desc"
}
},
{
"application_close_date": {
"order": "asc"
}
}
],
"from": 0,
"size": 8
}
The problem is cross_fields doesn't support fuzzyness and I want to support things like spelling errors, etc instead of having to rely on the student to type perfect match words.
Is there a way that I can rewrite the above in Opensearch to achive the same, but still have fuzzyness?
Thanks!