0

I have an app where people can list stuff to sell/swap/give away, with 200-character descriptions. Let's call them sellers.

Other users can search for things - let's call them buyers.

I have a system set up using Django, MySQL and Sphinx for text search.

Let's say a buyer is looking for "t-shirts". They don't get any results they want. I want the app to give the buyer the option to check a box to say "Tell me if something comes up".

Then when a seller lists a "Quicksilver t-shirt", this would trigger a sort of reverse search on all saved searches to notify those buyers that a new item matching their query has been listed.

Obviously I could trigger Sphinx searches on every saved search every time any new item is listed (in a loop) to look for matches - but this would be insane and intensive. This is the effect I want to achieve in a sane way - how can I do it?

awidgery
  • 1,896
  • 1
  • 22
  • 36

1 Answers1

2

You literally build a reverse index!

Store the 'searches' in the databases, and build an index on it.

So 't-shirts' would be a document in this index.

Then when a new product is submitted, you run a query against this index. Use 'Quorum' syntax or even match-any - to get matches that only match one keyword.

So in your example, the query would be "Quicksilver t-shirt"/1 which means match Quicksilver OR t-shirt. But the same holds with much longer titles, or even the whole description.

The result of that query would be a list of (single word*) original searches that matched. Note this also assumes you have your index setup to treat - as a word char.

  • *Note its slightly more complicated if you allow more complex queries, multi keywords, or negations and an OR brackets, phrases etc. But in this case the reverse search jsut gives you POTENTIAL matches, so you need to confirm that it still matches. Still a number of queries, but you you dont need to run it on all

btw, I think the technical term for these 'reverse' searches is Prospective Search http://en.wikipedia.org/wiki/Prospective_search

barryhunter
  • 20,886
  • 3
  • 30
  • 43
  • This is a great start - thanks especially for the correct technical term. If we were to make it a bit more complicated, eg. a saved search of "large quicksilver t-shirt", within a specific distance from a specific lat/long location. Then when a new listing is posted with title eg. "Quicksilver t-shirt from Australia - large, blue" I would have to run the query on this phrase using quorum, but it would return multiple hits which I would then have to cycle through. Would there be any more intelligent way than "rough search, fine search"? – awidgery Apr 19 '13 at 16:10
  • Well you can store the word count in the reverse index, even with str2wordcount, then use that to affect the ranking. Eg use the expression ranker, to promote results that have a hit_count at least as big as the attribute. This makes it easy to exclude the single word matches - during post processing. – barryhunter Apr 19 '13 at 17:10