0

I've spent the last couple of days getting to grips with Sphinx in order to power the backend of an autocomplete feature. Thanks to several SO users (BarryHunter being the most helpful) I now have a fully working setup complete with several indexes, delta indexers and more.

All that is left is to decide on the automation method to run the delta reindexing and merging to the core indexes.

My intention is to have the delta indexes updated every 5 minutes with the core indexes/delta indexes getting merged once every 24 hours.

My understanding is that this in its most simple form this is achieved by setting up cron jobs. However I have a real dislike for running cron jobs when I am not 100% confident of the amount of time they are going to take to complete. The indexes are going to grow very quickly and would like to avoid being in a position of having to deal with the reindexing cron jobs overlapping an grinding everything to a halt.

For this reason I have found myself considering gearman to manage to workload more effectively.

What I would like to know from any more experienced sphinx users (particularly if they have run a similar setup using gearman) is the following:

  • For starters, is this a good idea?
  • Is this even necessary (Are indexes in excess of 20 million rows going to take more than 5 minutes to complete)
  • Having never used gearman before are there any pitfalls I should watch out for?
  • How about using gearman for managing real time attribute changes? In order to provide instant index deletion etc. Worthwhile?

So really this is a general advice question rather than a specific one (which I hope is allowed) regarding this kind of setup. I would rather ask here than spend the next 24 hours getting to grips with gearman only to find that it is not a good solution for managing sphinx indexes.

NOTE: I have been searching for information regarding this exact setup for the last hour and have turned up very little. Hence the reason for my asking here on SO.

Thanks in advance for any advice offered.

gordyr
  • 6,078
  • 14
  • 65
  • 123
  • Can I ask you to elaborate **how** you would use gearman for this? Not sure I understand. I've talked about using gearman for distributing updates to RT indexes (via a mysql trigger on the origin, that triggers a background process to update the RT index). But dont see how gearman can be used effectivly for Disk Indexes – barryhunter Oct 16 '12 at 09:45
  • I see... It is likely that my understanding of gearman is erroneous in that case. My thought process was that it would enable me to work the reindexing asynchronously in a manner akin to html5 webworkers (but for the back end) from your comment it appears this is not the case which renders this question null and void. That said, using gearman for attribute updates appears to be the way to go then? – gordyr Oct 16 '12 at 10:26

0 Answers0