Till recently, I assumed that Microsoft NLB worked at an OS/Machine level rather than an Application Level. i.e. the NLB just monitors heartbeats on the machine to check if machine is alive and then switches off a particular node if it's gone down.
However, I found this comment on a server fault question which claims differently. As per the comment
NLB just routes connections to the TCP port that is open. If your application closes the port then NLB won't route connections to it any more until the port is open again.
- Is the above true? Does NLB monitor applications at a port level?
- If the answer to (1) is 'yes', then will it switch for both the service going down and also the service hung case or only for one of those cases?
- If NLB indeed does all of the above, then what's the case for using Clustering at all? Only advantage is that for clustering, you do not need replicated data. But overall clustering would be the more expensive solution.
- Will the answers to the above questions be different for a standard product like MS SQL Server as against my own service or is it the same?
- If NLB does not do the above and just does OS/Machine level heartbeats, then is there another way other than clustering to provide HA and switchover for my own service?