1

Let's say, I have a table objects. It has fields id, name, misc.

How can I find rows with similar or duplicate name values? I can see that MySQL can be used itself for searching duplicate values, but not for similar ones, eg. PHP Hypertext Preprocessor and PHP Hypertext Postprocessor (~90% of source value).

Can it be performed with Sphinx? And how?

Pavel S.
  • 1,007
  • 7
  • 14

2 Answers2

1

I don't know the details of Sphinx, but what you're talking about sounds like calculating Levenshtein Distances. Quickly googling for "sphinx php levenshtein" I found this thread which describes a method that might work for you. Hopefully that gives you something to go on.

Community
  • 1
  • 1
Gordon Bailey
  • 3,881
  • 20
  • 28
  • Levenshtein distance is just a method to compute the difference between two given strings I'm aware of. What I need is the actual row set of rows with similar field values. This can be performed by dumb algorithms, but I want to find out if some smart solutions exist. Well, thank you for noticing this. – Pavel S. Feb 27 '12 at 15:53
  • No problem, sorry it wasn't what you were looking for. Good luck with this. – Gordon Bailey Feb 27 '12 at 15:59
0

The 'suggest' example from sphinx might be useful starting point.

http://code.google.com/p/sphinxsearch/source/browse/trunk/#trunk%2Fmisc%2Fsuggest

barryhunter
  • 20,886
  • 3
  • 30
  • 43