1

We just completed complex network design for our new office. It has 2 ADSL routers connected to a Dual WAN Load Balancer router. Load Balancer is connected to 2 16-port Switches which connects 30 PCs. Also one 16-port switch is connected to another 16-port switch which in turn connects to the Load Balancer.

So my PC have logical path: PC >> SWITCH A >> SWITCH B [Optional] >> Load Balancer >> ADSL Modem [one of two available in network]

As I was facing some weird problems, I decided to run diagnose. My internet is working fine. Actually HTTP POSTs and FILE UPLOADS sometimes gets timed out.

Traceroute to external server (same output I get for Google/Facebook/etc). Number of hops remains 15.

[rtcamp@main ~]$ traceroute rtcamp.com
traceroute to rtcamp.com (70.32.85.76), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  rtcamp.com (70.32.85.76)  362.911 ms  364.550 ms  366.284 ms

Traceroute to Load Balancer router

[rtcamp@main /]$ traceroute 192.168.0.1
traceroute to 192.168.0.1 (192.168.0.1), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

My biggest problem is. We have created a public server for our subdomain like sub.example.com. Now sub.example.com works from outside world, but cannot be reached from network within.

I think if I can get normal traceroute output, things will be solved.

Any solution or idea?

Thanks,

-Rahul


Added on 10 September

Details of our network setup

  1. We have network of 192.168.0.x
  2. 192.168.0.1 is load balancer
  3. 192.168.1.1 is ADSL modem A
  4. Another ADSL modem is in bridge mode
  5. We have PC's from 192.168.0.2 to 192.168.0.50 (PC get IP address dynamically)
  6. 192.168.0.101/2 are for server in LAN. Its only one server with 2 LAN cards so 2 ip address.
  7. 192.168.0.200 is Wi-fi router and 192.168.0.201 onwards IP address are for laptops connected to wi-fi router. Wifi router gets LAN IP 192.168.0.100 from Load balancer as well on its ethernet interface.
rahul286
  • 1,647
  • 5
  • 20
  • 25
  • Post a trace route from your PC to your routers, and to your load balancer as well (remember to remove your external IP. – Joseph Kern Sep 02 '09 at 09:34
  • @Joseph Added more info. What did you mean by "remember to remove your external IP"? Should I disconnect WAN (ADSL modems) before posting traceroute. – rahul286 Sep 02 '09 at 11:20
  • Can you list all of you're IP addresses, the ones assigned to the hosts used for testing, and the ones assigned for the server you're trying to get to? Also, can you show the mac address table and mac-addresses used by the systems. It could be something as simple as you're server isn't listening / responding to the arp. Also, I would try grabbing wireshark, and seeing what the communications really look like. – Kevin Nisbet Sep 07 '09 at 03:16
  • @Kevin - Added details of network setup. Please let me know if any other info which I can provide. – rahul286 Sep 10 '09 at 14:47
  • Ok I might have an idea what's going on, since you're a private IP range, some sort of NAT must go on outside you're local LAN. For one it looks like the LB doesn't respond (which could be quite normal for security reasons). However, moving out onto the internet, you're NATed to whatever you're modem public IP is. This NAT may not be allowing the ICMP-lifetime-exceeded message back through itself to reach you're PC. The reason you see the last hop is it is actually replying as itself. – Kevin Nisbet Sep 16 '09 at 05:02
  • Did this ever get resolved? – Joseph Kern Sep 18 '09 at 09:49
  • @Kevin & @Joseph - I tried all but gave up on this few days back. I removed load balancer and now we are using simple 2 16-port switches with one ADSL modem. One of my friend who is network admin in big corp came and suggested to use Fedora 10. We are using fedora 11. His point is that, latest linux distros are never safe to us. Either way, I gave up! I am PHP guy. This network thing distracted my focus very much. :-( – rahul286 Oct 08 '09 at 13:43

4 Answers4

1

1 thing, if you're using solaris based traceroute, you can do a traceroute -I rtcamp.com which will use icmp for the traceroute. We do this at work since UDP traceroute is blocked on our Firewall.

The other thing, you may have an ACL on you're WAN router, or a Firewall not mentioned, that is blocking the icmp time-exceeded message. If you allow these messages, at least internally, traceroute should work (and there is no risk to allowing this, only some types of icmp messages are bad).

As for the clients not being able to talk to the servers, are they on the same subnet, or are they on a seperate, possible secured network that is non-routable? It sounds like a rather simple network, but it sounds like you're WAN router might be specialized, and not to actual routing internally???

Kevin Nisbet
  • 818
  • 6
  • 8
  • A firewall wouldn't explain the last ICMP getting through. – Joseph Kern Sep 02 '09 at 09:41
  • I am using FEDORA 10. Same is the output I am getting on Mac as well as Windows PC connected in the LAN. – rahul286 Sep 02 '09 at 11:21
  • A firewall could easily explain the last ICMP, it doesn't necessarily need to block ICMP from every source/destination – Kevin Nisbet Sep 07 '09 at 03:08
  • By default there was no firewall active. Now I added whitelist for ICMP packets. Now, I am damn sure that my firewall isn't blocking any ICMP packets. – rahul286 Sep 10 '09 at 14:49
  • @Kevin - Just a simple question. Is traceroute I put above normal? Have you seen such weird thing before? – rahul286 Sep 10 '09 at 14:52
  • It could be normal, but I doubt it looking at you're post again. The one where you go to the public IP, is probably normal, but using you're internal IP you should be able to get to the server directly without routing (as in no hops, you're just switched to the host) – Kevin Nisbet Sep 16 '09 at 04:56
1

It looks to me like you have a switching loop. Do the switches run STP?

joeqwerty
  • 109,901
  • 6
  • 81
  • 172
  • I am using D-link unmanaged switches. I don't think I can have control over any aspect of switches. – rahul286 Sep 02 '09 at 11:22
  • The switches are interconnected right? It sounded from your description like there are multiple paths through the switches to the LB router. If that's the case, and if the switches are not running STP, then there's a switching loop. Who manages the switches? Can you contact them to find out how they're interconnected and whether or not STP is running? – joeqwerty Sep 02 '09 at 12:13
  • The other thing that makes me think it's a switching loop and not a routing loop is that typically with a routing loop you'll see the trace route bouncing between the ip addresses of the routers, which you don't see here. – joeqwerty Sep 02 '09 at 19:27
  • I can not rule out possibility of Switching loop as I have 2 switches in the network. My Load Balancing router is Dual WAN, 4-LAN Port router. If I connect 2 16-ports switches on different LAN ports, will it solve problem. As of now SWITCH A is connected to LAN port 1 and SWITCH B is connected to one port of SWITCH A. – rahul286 Sep 06 '09 at 08:39
  • #Update# Just connected both switches to different LAN ports on router directly. Problem is not solved. By the way should I use "Switching loop" as keyword to Google more info on this? Any other keywords you can suggest? – rahul286 Sep 06 '09 at 08:46
  • Can traffic go from one switch through the router to the other switch? If so, and if the switches are connected together also, then it probably is a switching loop. Do the switches have a management interface? If so, look at the MAC address tables and see if you see if there are any MAC addresses that registered on multiple ports on both switches. If there is then that means there is a switching loop. – joeqwerty Sep 06 '09 at 15:47
  • If it were a switching loop, by now the network would be close to completely unusable, since the looped ports would be out of bandwidth by now, and anything else you'd be doing like headed to the internet would probably notice the lost packets. – Kevin Nisbet Sep 07 '09 at 03:11
0

Something is blocking ICMP. The blocking of ICMP time exceeded in transit messages is breaking traceroute. The blocking of ICMP fragmentation needed messages is breaking TCP. Most likely it's the load-balancing router.

David Schwartz
  • 31,449
  • 2
  • 55
  • 84
  • This issue is no longer there. Actually, we changed our network structure sometime back. But if I face this issue again, will check if ICMP is getting blocked. Thanks. – rahul286 Sep 28 '11 at 09:32
0

Actually HTTP POSTs and FILE UPLOADS sometimes gets timed out.

It looks and sounds like you may have a routing loop. Given that inbound access works, but outbound does not. I could be completely wrong, but that's what it seems like.

Try setting up an SSH connection to an external system. If it fails, or is very slow, you might have a routing problem.

I would also think about removing one of the routers from the load-balancer. If you can traceroute with only one router, you have a routing or load balancing issue.

Joseph Kern
  • 9,899
  • 4
  • 32
  • 56
  • SSH access works. Also HTTP POST and file upload works most of time. It gets stuck sometime. No application is failing 100% of the time. But it seems to me output of traceroute is not normal. Is it normal? – rahul286 Sep 02 '09 at 11:24