0

I'm working with OpenSearch, and I have a large input text that contains several exercise names. I'd like to extract these exercise names from the input text and search for documents that match these names in my OpenSearch index.

The input text can be of any format and contain various characters, such as lowercase or uppercase letters, numbers, and special characters. Exercise names within the input text are not guaranteed to start with a capital letter or follow any specific pattern. Here's an example of an input text:

I will make a good 10 push-ups and Dumbbell Deficit Push-up

In the index I have:

[
    {
        "id": 2,
        "name": "Ankle Circles"
    },
    {
        "id": 3,
        "name": "Barbell Deep Squat"
    },
    {
        "id": 10,
        "name": "Push-ups"
    },
    {
        "id": 11,
        "name": "Sit-up"
    },
    {
        "id": 12,
        "name": "Air Squats"
    },
    {
        "id": 13,
        "name": "Dumbbell Deficit Push-up"
    },
    {
        "id": 14,
        "name": "Pretzel Stretch"
    },
    {
        "id": 15,
        "name": "Cobra Stretch"
    },
    {
        "id": 20,
        "name": "Push-ups with Elevated Feet"
    }...
]

Here my Search Request:

SearchResponse<ExerciseOSDto> searchResponse = openSearchClient.search(
        s -> s.index("exercises")
            .query(new Query.Builder().match(
                    new MatchQuery.Builder()
                        .field("name")
                        .query(new FieldValue.Builder()
                            .stringValue(payload.getText()).build())
                        .operator(Operator.Or) 
                        .build())
                .build()), ExerciseOSDto.class);

But from this example i have all exercises where present (up/ups/push).

From the input text, I'd like to get exercises with id - 10 and 13

What is the best approach to extract these exercise names from the input text and perform a search in OpenSearch?

Any help or guidance would be much appreciated!

Taras Vovk
  • 21
  • 2

1 Answers1

0

You can use customize the analyzer to achieve the purpose

Here I create an analyzer with basic tokenizer and token filter

PUT exercises
{
  "settings": {
    "analysis": {
      "analyzer": {
        "exercise_analyzer": {
          "tokenizer": "whitespace",
          "filter": ["lowercase"]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "exercise_analyzer"
      }
    }
  }
}

After insert data, you can perform the match query (same as the code logic you provided)

GET exercises/_search
{
  "query": {
    "match": {
      "name": "I will make a good 10 push-ups and Dumbbell Deficit Push-up"
    }
  }
}

But just point out, follow this way you still get matched some documents which is not exactly you want. For example, Push-ups with Elevated Feet in this cases.

This is very hard to achieve if merely rely on full text search on Elasticsearch/Opensearch.

I think the easy way is to apply an additional filter logic on client side after you get the search results from Elasticsearch/Opensearch

# input_str represent the input text
# results represent the exercises name you got from opensearch
final_results = [r for r in results if lower(r) in lower(input_str)]

Let me know if I missed your points, or anything you think it is not work. Thank you!

Y.Peng
  • 16
  • 1