2

Setup

Host B <--> Router <--> Host A
  • Host A: IP = 192.168.1.10, Net = 192.168.1.0/24, VLAN = 1, Default GW = 192.168.1.1 (Router)
  • Host B: IP = 192.168.2.10, Net = 192.168.2.0/24, VLAN = 20, Default GW = 192.168.2.1 (Router)
  • Router: IP = 192.168.1.1, 192.168.2.1, VLAN = 1, 20

All devices are connected to a switch with these VLANs configured.

Ping-Test

Now, if I try to ping Host A from Host B, the following occurs: Host B makes an ARP request to find out the MAC-address of the router and sends the Ping request to the router. The router makes also an ARP request to find out the MAC-address of the destination Host A and forwards the Ping request to Host A. That's ok and that works.

ARP requests for another subnet??

Now the strange part: Host A, of course, tries to answer the Ping, but(!) it doesn't make an ARP request to find out the MAC-address of the router to send it the Ping-Reply to forward it to Host B. Instead of that it sends an ARP request asking for the MAC-address of Host B directly. Of course, that doesn't work, there will be no answer on the local subnet, because the broadcast domain is restricted to the VLAN 1.

ARP cache on Host A (192.168.1.10) looks like this:

# arp -an
? (192.168.1.1) at 16:bc:aa:f2:bc:44 [ether] on eth0
? (192.168.2.10) at <incomplete> on eth0

When I try to delete the weird ARP resolution attempt, I get this message and the failed ARP attempt is still in cache:

# arp -d 192.168.2.10
SIOCDARP(dontpub): Network is unreachable

ICMP-Redirects from router

So, no (bidirectional) communication between Host A and B is possible. And instead of Ping-Replies, Host B, gets an ICMP-Redirect-Request from the router: Host B should send packages direclty to Host A.

My questions

  1. What makes Host B trying to send an answer by ARP resolving a host of another subnet? Why is it the Ping-Reply not sent to the router?
  2. Any idea what role the ICMP-Redirect plays?

Appendix

Host A

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.1.1     0.0.0.0         UG    0      0        0 eth0
192.168.1.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0

# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether ab:cd:a9:9a:cc:dc brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.10/24 brd 192.168.1.255 scope global eth0
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ab:cd:a9:9a:cc:dd brd ff:ff:ff:ff:ff:ff

# ip r s
default via 192.168.1.1 dev eth0
192.168.1.0/24 dev eth0  proto kernel  scope link  src 192.168.1.10

Host B

# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.2.1     0.0.0.0         UG    0      0        0 eth0
192.168.2.0     0.0.0.0         255.255.255.0   U     1      0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 eth0

# ip a s
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 40:7d:7a:a3:f5:dd brd ff:ff:ff:ff:ff:ff
    inet 192.168.2.10/24 brd 192.168.2.255 scope global eth0
3: wlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc mq state DOWN qlen 1000
    link/ether 47:5e:33:a6:31:5e brd ff:ff:ff:ff:ff:ff

Router

Routing table:
Destination-IP   Subnet mask      Default gateway   Hop count     Interface
<public-net>     255.255.255.224  *                 0             eth2   
<public-net>     255.255.255.224  *                 0             eth1   
192.168.1.0      255.255.255.0    *                 0             eth0   
192.168.2.0      255.255.255.0    *                 0             eth0   
default          0.0.0.0          <public-router>   15            eth1   
default          0.0.0.0          <public-router>   40            eth2   
default          0.0.0.0          <public-router>   40            eth1

public-net ...... Address of public subnet (internet-uplink)

public-router ... Address of uplink-router

Router is a Cisco RV320 with web interface only, that's all I can get. PS: It's a load balancing dual uplink setup, but that shouldn't make a difference for the ARP problem.

Deputy Rock
  • 39
  • 1
  • 4
  • What does the routing table on the router and clients look like? – joeqwerty Jan 07 '15 at 02:18
  • 1
    Something is probably wrong in the netmasks or routing table on host A. – David Schwartz Jan 07 '15 at 02:27
  • Add the output of these commands on Host A to your post: `ip a s` and `ip r s` – fukawi2 Jan 07 '15 at 02:37
  • I've added routing table etc of Host A (see above). – Deputy Rock Jan 07 '15 at 03:39
  • Is that really the full output of those commands? The last line of the route -n output looks odd (192.168.0 as the Destination), and the output of "ip r s" shows no default route. – Craig Miskell Jan 07 '15 at 05:03
  • Seeing the routing table from both hosts AND the router would be helpful. – joeqwerty Jan 07 '15 at 06:31
  • @CraigMiskell Sorry, some copy-paste problems – Deputy Rock Jan 07 '15 at 08:10
  • @joeqwerty Added some info. That's all I got, hope this helps. – Deputy Rock Jan 07 '15 at 08:10
  • The routing tables look sane, but the ARP behaviour of Host A looks like it thinks 192.168.2.10 is on a directly connected subnet. While I don't really hold much hope, could you share the output of "ip route get 192.168.2.10" from Host A, please? – Craig Miskell Jan 07 '15 at 08:21
  • The routing table of the router shows both internal networks as being connected/reachable via the same interface, eth0. I know you said you got it working but that's definitely not correct. – joeqwerty Jan 07 '15 at 15:17
  • @joeqwerty It's correct and not correct at the same time. Its the same physical interface, but different VLANs. Linux has special interface names with vlan id encoded, but I dont know whether they are displayed on the Cisco web interface. As there is no shell access, it's really difficult to debug. But I found a solution anyway. Thanks! – Deputy Rock Jan 12 '15 at 09:30

3 Answers3

1

The routing table on the router looks incorrect. It looks as if you are running both VLAN untagged from the router.

I don't know how the switch manages to deliver packets from the router to both A and B, when the router apparently sends all of the packets to the switch with no indication of which VLAN they belong to. The switch I am using wouldn't be able to do that. But perhaps you are using a brand of switch which can somehow correctly guess which VLAN to send the packets to.

However from the routers point of view A and B are on the same Ethernet segment, which means the router is expected to instruct A and B to communicate directly without involving the router. And that is where communication breaks down.

The routing table entries looking like this:

192.168.1.0      255.255.255.0    *                 0             eth0   
192.168.2.0      255.255.255.0    *                 0             eth0   

Should in fact have been looking like this:

192.168.1.0      255.255.255.0    *                 0             eth0.1     
192.168.2.0      255.255.255.0    *                 0             eth0.20    

The virtual interfaces eth0.1 and eth0.20 can be created with the commands:

vconfig add eth0 1
vconfig add eth0 20
kasperd
  • 30,455
  • 17
  • 76
  • 124
  • Sounds logical, but the routing table is just from the web interface of the Cisco RV320. It's a "small business" router based on Linux without shell access. It seems that this Linux version is quite heavily modificated (special drivers for hardware NAT and so on). So I'm not sure, whether VLAN membership is in the same notation reflected as on an ordinary Linux machine. – Deputy Rock Jan 07 '15 at 14:35
  • @DeputyRock Without shell access every problem is going to be tricky to debug... – kasperd Jan 07 '15 at 18:08
1

Found a solution for me: I put Host A and the subnet 192.168.1.0/24 to new VLAN with ID 10. Now everything is fine. It's ok for my overall configuration, but still strange, that it was not working with VLAN ID 1. Maybe the router is the problem and it treats VLAN 1 in a special manner. But how could that affect Linux ARP behavior? Still a question.

Deputy Rock
  • 39
  • 1
  • 4
0

The behaviour you are seeing with the VLAN-1 is usually because that vlan id is the management vlan on the switch which is untagged.

IgorC
  • 41
  • 1
  • 4