0

Basically, I want to use a kNN algorithm to build something like a search recommendation engine for a cookbook app. The idea is that if the user inputs something like "chicken","chicken breast", "chicken meat", "boneless chicken" and etc. their recommendation will be "chicken". I have implemented a more or less working version of kNN, but the problem is measuring the distance. My first choice was Levenstein distance, but the problem is that it is possible to have several words in the input. At the moment I am calculating Levenstein distance on a word by word basis and taking the average, but this still is not a perfect solution. I was hoping someone might suggest something better. I realise that the dataset is very unusual, so a lot is going to depend on what is in my dataset. If you think that kNN is unnecessary here and I should be just comparing the strings, then you might be right but I still need another way of comparing them. The hope is that the algorithm would be able to correctly differentiate "chicken meat" from "turkey meat" and allow for small mistakes in the input (spelling). Obviously, it doesn't have to be that accurate. Another possibly important point is that user input has to be categorised somehow, otherwise it can not be used. So if the user types something that can not be categorised, they are expected to try again (the hope is that algorithm will find a suggestion eventually and it won't be too annoying).

Ilya Lapan
  • 1,103
  • 2
  • 12
  • 31

1 Answers1

0

It seems like you want semantic distances, not typographic distances. This is challenging, but take a look at https://en.wikipedia.org/wiki/Word2vec

fgregg
  • 3,173
  • 30
  • 37