Pentaho Carte Load Balancing

Question

Is there any simple way to send jobs remotely to the master Carte server and have it delegate each job to a different slave server?

From what I have read, my only option for out of the box load balancing in Pentaho is to adjust the clustering configuration on the steps within my transformation and then the transformation steps containing this configuration will make use of the slave severs defined. This way I can have a "sort of" load balancing approach but really it is parallelization of individual jobs.

That's not what I'm looking for. What I need is a simpler approach which does not involve the complexity of in-job parallelization but simply passes each job or transformation to a different slave in, say, a round robin fashion, thus exercising all the hardware rather then everything running on the master.

Thanks in advance

MrMauricioLeite · Answer 1 · 2016-01-02T22:46:50.470

I hope my answer helps you, even though I'm not a specialist but a fellow Pentaho user who is just trying to do exactly the same as you described and my experience so far is this:

(if anyone find something wrong on my answer, please let me know. I want to learn too =D)

What PDI Clusters are? - A scale out solution

Pentaho Data Integration clusters are awesome (1) to break huge transformations that uses up a lot of CPU/memory into smaller chunks and (2) to speed-up execution time with a clever design or at least make it run in common hardware (not a huge server with 24 CPUs and 256GM of RAM)

Is there a way to automatically distribute transformations (round-robin) inside de cluster?

I'm sorry to say that until now I've not been able to do that in my AWS instances. I use 3 EC2 in AWS to test the distribution with some different structures that follows:

One master, two slaves - I sent all transformation entry to be executed by the same master in hopes that it would round robin between the slaves and only execute some transformation when the slaves are full of things to do. But it didn't happened this way, the master took all the work for himself and the slaves didn't do anything. (the same happens if you send a job that have parallel transformations to run)
Three masters, via elastic load balancer - The ELB from AWS is a awesome way to distribute app requests from different sources to all your EC2 instances and I thougth that it could help me distributing my transformation to all the carte machines (all masters). Well it turns out if it's the same host making the request, you get pointed to the same EC2 instance. So everytime I sent the test job to run, one random master took all requests and the others just sat there, waiting. No good news here.
Three masters, route 53 - Route 53 is the AWS DNS service and have a special ability to route your website/webapp requests in a lot of different ways. One of them is round-robin. But I got the same problem Elastic Load Balancer gave me. One random server got all the trouble, so, no good news here too.

Possible sollution

Well, it's not all a nightmare in which you can't distribute your transformations to a bunch of other machines execute. You actually can! But neither Carte, nor Elastic Load Balance, nor Route 53 will do the round robin for you. So what you do is just add all your slave servers (or masters servers) to your job, assigning a different slave server to each Transformation. That's doable in the advanced tab, like in the screenshot:

Thanks for sharing your experiences, it is most helpful. – orion_kid Jan 28 '16 at 09:46 — orion_kid, Jan 28 '16 at 09:46

Pentaho Carte Load Balancing

1 Answers1