4

I've inherited a legacy rails application that I recently moved wholesale to AWS. Part of the infrastructure is a ferret index that exists on each of the ten backend application servers that build subscriber email every night. We have a large database, so it takes about four hours for each of the backend servers to rebuild the index every day. Across ten app servers, that's a lot of extra hours to keep most of them spun up when they would otherwise be spun down. (all told, each set of indices is about 8gb per server).

All the rebuilt ferret indices are reading the same data from the main database. What I'm wondering is - could I have one applications server do the daily rebuild, then rsync the final rebuilt indices to the other application servers? I don't know enough about ferret (or rails for that matter) to know what sorts of dependencies there might be here. I would think that an index is an index, so copying the exact same data to all the servers should be 'noncontroversial' so to speak. Am I out in the weeds or on the right track?

anastrophe
  • 5,488
  • 2
  • 16
  • 16
  • 3
    If you do determine that the indexes are not required to be run on the app servers, this sounds like an excellent use-case for [spot instances](http://aws.amazon.com/ec2/spot-instances/). They usually only cost 1/10th of the price of on-demand instances with the condition that they might disappear at any time. – Ladadadada Sep 12 '13 at 10:13

1 Answers1

7

I don't see why not.

I mean, have you tried it?

Ferret is just a ruby port of Lucene, and you can do neat things with a Lucene index, like rsync, and NFS share it, as long as the servers only need Read-Only access to it.

You will however, have to stop indexing in order to run the copy process, so that you get a consistent snapshot of the data in the index.

Although it sounds like it's a batch job that runs for a bit and stops, ICBW.

My best suggestion to you is to try it. Take a couple of servers out of the pool, and experiment with rsyncing the index between them, then test it. You should be able to define some test cases for validating your hypothesis, right?

Sources: 1 2 3 4

Tom O'Connor
  • 27,480
  • 10
  • 73
  • 148
  • 1
    I should have been a little clearer when I said I don't know ferret or rails - I'm a sysadmin, not a coder. So, I asked because I've no idea about the fundamental workings of it besides the server side of things. No idea how I'd test it, as I don't know how the application uses it, so no easy way to see if it broke. There are some contract devs available, maybe they can piece through the code. Anyway, that's why I'm reluctant to just try doing it. It's a production application with a lot of eyeballs on it (I moved the whole thing from legacy iron in a colo to AWS a couple of weeks ago)... – anastrophe Sep 11 '13 at 17:12
  • 8
    You're a rare type, the sysadmin who won't or can't look at the code to figure out how it works. Best of luck to you, I guess. – Tom O'Connor Sep 11 '13 at 21:45
  • Was the snarky reply really necessary? – anastrophe Sep 11 '13 at 23:01
  • 8
    I think so.. If I was in your position, I'd get stuck in and do some investigation. It's how we stay ahead of the game. If you don't want to do that, I think you'll find yourself being overtaken by younger more agile sysadmins. As the Americans say, "step up to the plate". – Tom O'Connor Sep 12 '13 at 09:50
  • 3
    If its on AWS why don't you spin up a second smaller instance to test your changes on first? – ITHedgeHog Sep 13 '13 at 09:40
  • 5
    Snark aside, @TomO'Connor is spot on. The days where it is possible to be *only* a sysadmin, without needing to dirty one's hands in code are numbered. If we (myself included) want to remain relevant, we need be comfortable with poking at code, if not implementing. None of this (ferret, lucene, rudimentary ruby) is at all complex. Like ITHedgeHog said, just spin up another instance (or a local virtualbox vm) and poke around. You're not going to hurt anything. – EEAA Sep 13 '13 at 13:41
  • Many assumptions in these comments. I wouldn't have asked if I'd had the luxury of free time to play around in code. I'd hoped someone actually knew the answer - nobody does. The issue's resolved, the coders snaked through the 5 years of accumulated code and found no dependency. People/situations are not cookie-cutter. In a competitive market like the SF Bay Area, not being a coder is not a liability for a sysadmin. I asked for help due to lack of spare time, not laziness or ineptitude (voretaq7 - if you delete my comment, delete the other chatty comments too. Don't be one-sided in moderating) – anastrophe Sep 14 '13 at 23:41