4

Is it possible to replicated a complete JVM and in case of failover simply flip the load to the replicated JVM?

If yes then how can we do it?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Ranger
  • 435
  • 1
  • 4
  • 19
  • failover in case of the JVM, sounds way too complicated :), but hopefully someone else knows. – Eugene Dec 27 '11 at 10:08

3 Answers3

2

In case your application is a web app, read about "Clustering" and "Load Balancing". Most application servers support clustering.

You can also have a look at JGroups, which provides inter-JVM communication.

Manish
  • 3,913
  • 2
  • 29
  • 45
  • Thanks for the response. yes actually we are using jetty as application server and jgroup for clustering. Also we are using infinispan for caching. But changing all the datastructures to use infinispan will take much more time so thats why i was thinking if i can replicate the whole jvm as in the beginning load will not be that high. I will try to explore jgroups to figure it out. – Ranger Dec 27 '11 at 10:11
  • @Ranger The CPU can update memory at a rate of 6 GB/s or over 60 Gb/sec. If you try to replicate that over a 1 GB/sec or even 10 Gb/s link you may find you are always behind or you have slow your server dramatically so the second site is never too far behind. Also rolling back an incomplete update is very complex. – Peter Lawrey Dec 27 '11 at 10:16
  • @Peter Yes absolutely, the performance will go down but for the coming 2-3 month we can live with a degraded performance and put in more machines. And yes transaction management is a heck of a job which is taking lot of time in case of cache. So the primary objective is to reduce the development time for failover. Any suggestions :) – Ranger Dec 27 '11 at 10:21
  • Copy all the inputs from the primary site to the secondary site. If you order these inputs the same way, the secondary site should be in the same state as the primary (except for the lag between sites) – Peter Lawrey Dec 27 '11 at 10:26
  • Are clustering and load balancing the same thing? – Mister Smith Dec 27 '11 at 10:53
  • Going ahead with this approach as this seems to be a long lasting viable solution. Final take: jgroups for clustering, infinispan for caching, a mix of apache and in house developed framework for load balancing and lot of application logic to store data in cache with transaction support and lot of application logic to reconstruct the state in case of failover using data from the cache. – Ranger Dec 27 '11 at 13:01
1

This is not something that's done at the JVM level, but there are many products out there that handle this in processing of messages. Usually this is a feature of an Enterprise Service Bus. Google that and you will get some ideas.

Francis Upton IV
  • 19,322
  • 3
  • 53
  • 57
  • Thanks for the quick response. I am using jgroups for messaging and infinispan for caching but changing all the datastructures to use infinispan was taking lot of time thats why for starting i was planning to go with whole JVM replication. just out of curiosity, can we replicate the whole machine with hardware replication? – Ranger Dec 27 '11 at 10:16
  • Look into virtual machines for replicating whole machines. There is a lot of technology to do this. Though you will also need some support with an ESB or other technology to cause the messages to flow to the right place (in Java land). Gigaspaces I think is a commercial company that has some technology that might be useful here. – Francis Upton IV Dec 27 '11 at 10:21
1

Yes in theory, but the main problem would be that your applications will be much, much slower (like 100 to 1000x) and this is what puts most people off doing a full replication.

Instead you need produce a data stream of the important pieces of information e.g. all the input or out messages (or both) and send this to the second machine and re-build the state from the existing data.

BTW: When you lose a TCP connection with the server, these have to be closed and re-connected. These are not failed over transparently. UDP avoids this issue by not having connections but is much harder to work with reliably. One way around this is to have a simple proxy/load balancing server which sits between the client and the server. Because it is simple is less likely to fail, and it hides the reconnection with the server. However you have a data centre failure, it will be gone as well.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • yup this is exactly what we are doing now. Have the least number of commit in a cache farm and read from the cache in case of failover. – Ranger Dec 03 '12 at 05:54