8

We have a computationally intensive service which is used to do a number of transformations. Its largely computationally bound (CPU bound) process. Essentially what happens is we have a message broker which sends messages to the processing service via Thrift.

Now we have multiple different processing services which run different algos to do processing on the messages - these messages are routed to one or more processing algos. Our message volumes are variable and so are the needs of the processing algos (i.e. we can get many messages that contain XYZ then send to algo 1 otherwise send to algo 2).

We would like to expand this into something that is horizontally scalable. So we have multiple nodes which are running the processing algos. Now depending on the messaging loads our Thrift requests should be sent to different servers (assume that all services are running an instance of each processing Algo1 to 3). Say for example we are getting a large number of messages which we want to process on Algo 1 then we have two servers running algo 1 and the 3rd server looks after requests for the other two algos (Algo 2 & 3).

So the system looks like this:

Client ----Request-------|
              -----------|--------------------
              | Coord & Load Balancer Service | ... like zookeeper
               --------------------------------
                      <--|-->
                         |    Route messages to servers...
   Server1:               Server2:          Server 3:
Algo1 instance        Algo1 instance      Algo2 instance
                                          Algo3 instance    

All processes are written in Java.

So how easy would something like this be to setup using Zookeeper. I know that as we add or change algos we can easily use Zookeeper to handle the config side of things (i.e. servers listen for algo updates or additions and serve them as configured) but how do we manage the load-balancing aspect?

Cheers!

NightWolf
  • 7,694
  • 9
  • 74
  • 121
  • Hi, have you solved this question? Can I ask you how did you do that? I'm researching on the same question right now (scalable self-maintainable thrift service deployment), and any experience or advices would be nice to get ;-) – Sergey Vasilyev Dec 04 '11 at 18:35
  • @Sergey Vasilyev, we implemented our own solution a top ZooKeeper in a similar manner to the way Norbert does things (with many less features). May consider implementing Norbert if some free time rolls around, very nice looking project. IMO dont do what we did, take a look at Jonas's answer below. Let me know if you still want more info on our impl. – NightWolf Dec 06 '11 at 13:35
  • yes, similar to Norbert, except that I use Python & PHP (already) and am going to use Java & C/C++ in the future (nearest or far enough — haven't yet decided). That is why I've chosen Thrift as a RPC solution. And the only thing left is load balancing & cluster management. ZooKeeper looks good for this, but I'm looking for particular architectural solutions. Norbert, as seen on that picture, is a base concept of what I am looking for. So, is it good? Is it stable? Is it easy to manage? Or can you describe how did you solve this? My jabber is "nolar@nolar.info", Skype is "nolar.info". Thanks :) – Sergey Vasilyev Dec 06 '11 at 18:26
  • PS: When I ask "is it good", etc, I mean the concept, not the Norbert itself :-) – Sergey Vasilyev Dec 07 '11 at 03:49
  • Really depends on your use case, horses for courses right. For us (distributed classification) it makes perfect sense. In fact for any application where you would like to avoid a centralized load balancer it makes sense. There are a few points to note however, mostly around trust; i.e. the fact that our clients are responsible for the load balancing. It would be possible for a malicious client to hit loaded servers to create a DOS situation. With a transparent centralized load balancer this isnt as much of an issue. So if you design your infrastructure with this inmind its a great solution. – NightWolf Dec 07 '11 at 12:05

2 Answers2

6

You guys probably want something like Norbert from LinkedIn: http://sna-projects.com/norbert/ They use persistent peer-to-peer communication between clients and servers and use zookeeper for service registry and out-of-band signaling. Pretty cool stuff. It enables you to just fire up another processing node that can help out to handle requests during high load.

/ Jonas

Jonas Bergström
  • 741
  • 6
  • 15
  • Very nice. We ended up implementing something like this, no where near as full featured, but a similar concept. Shame we didnt know about this a few months ago. Thanks! – NightWolf Dec 06 '11 at 13:33
2

Take a look at DynamicPool and ZooKeeperNode. A default wiring is available in ThriftFactory for reference. http://twitter.github.com/commons/

Yony
  • 255
  • 2
  • 9