9

I have two OSX machines on a local subnet. Pings from Machine A to Machine B consistently timeout until Machine B pings Machine A. After B pings A, A can successfully ping B, but this success is temporary; if there is no communication between the two machines for a half hour or so, pings from A to B begin timing out again (until B pings A again).

How can I ensure that B is always reachable from A, without having to "kickstart" the connection by pinging A from B first?

ericsoco
  • 193
  • 1
  • 1
  • 5
  • How about ICMP traffic to/from other machines on the network? – Mathias R. Jessen Jan 03 '13 at 23:46
  • 4
    Sounds like there's some kind of state-based firewall in your network or on your OS X machines. When you do the ping that succeeds, it must be creating a temporary `pass` entry in the ruleset, so, subsequent pings from the other machine can succeed, too. – cnst Jan 03 '13 at 23:52
  • When you can't `ping`, what goes wrong? How far do you get? (Does the sending machine get an ARP entry for the receiving machine? Is the ARP entry correct? Does the receiving machine receive the ping? Does it send a reply?) – David Schwartz Jan 04 '13 at 00:21
  • @DavidSchwartz, see my comments on SmirksWhileWalkingWCabaretGirl's answer below re: ARP entries. oy, SmirksWhileWalkingWCabaretGirl, can't you find a shorter username?? – ericsoco Jan 04 '13 at 01:36
  • @ericsoco I know this not an answer, but I can't comment because I don't have enough point. But I belive it's worth noting, that I used the same router E1200 and have the same problem. Raspbian (raspberry pi with debian) is the host that have to ping first in order to be pinged by other host (in your case it's host B) and a Win7 and ubuntu as other host (thus host A, in your case). Raspbian is set on static IP, and the rest is on dynamic IP. I belive that raspbian doesn't suspend/hibernate, because when I cannot ping it, it still serves a webserver and ssh server (port forward with E1200, but – simonsays Aug 09 '13 at 06:53

2 Answers2

12

When you say the same subnet, are you clear about the industry definition of subnet because is a frequently baroquely misused term.

I want to clarify you're saying the two OSx boxes aren't separated by a router (firewalls are routers with attitude) linking two address schemes where these boxes reside.

What you're describing is a dead ringer for stale ARP records being missed with either a nonexistent or misconfigured ARP pruning plan. ARP is Ethernet's method of addressing that TCP/IP addresses are co-dependent on proper operation.

You're detailing one system fails to find the other until it is reached first by the same computer it just failed to find until it is updated first by the missing computer.

If this is an ARP pruning issue, the missing computer that has to ping second is updating your first and errant computer with its new ARP/IP combination, if they're both on different subnets, an industry grade switch such as a Cisco or NetGear professional device would manage these records seamlessly and prevent this from happening frequently compared to a network unequipped with such gear.

A good question to ask, is Machine B equipped with a more aggressive power management policy or less frequently used and being allowed to suspend / hibernate? Because doing so isn't a perfect event, and the Windows community has this problem on larger small company networks attempting to save money on electricity but haven't upgraded their network infrastructure. So PC's tap out, their IP addresses get leased to someone else or returned to the address pool and remain unassigned. Your Machine A remains insistent Machine B must be at this particular IP address when it was long since evicted from that address.

Confirm this by preventing both machines from suspending or turning off their NICs to conserve power. WiFi NICs typically use more power and are more aggressively managed so if these are WiFi boxes, check all the power management settings in both machines to ensure they remain alive and powered.

TCP/IP commands share more common parts than other commands used in Mac/Windows, on the Win side, we use arp -a to dump the listing of IP's and ARP's, the next time your B box is unreachable and assuming you can reach it, visit it first and don't ping anything yet to awaken anything until you determine its current IP address.

Then from Machine A confirm it can't find Machine B again and execute the equivalent arp -a command, compare the IP Machine A (the box that can't B until B announces itself), if the IP's are different from A's cache and B's physical inspection... you got an ARP pruning issue and you owe me full credit for this detailed answer.

mwfearnley
  • 816
  • 1
  • 11
  • 22
  • hm, this sounds like it may be the right direction. a quick google turns up people frustrated with OSX's inability to manage power to NICs: https://discussions.apple.com/thread/3570949. will try your suggestion momentarily... – ericsoco Jan 04 '13 at 01:20
  • well, my IPs are not changing -- A stays on 192.168.1.109 and B stays on 192.168.1.110. this is a home network, via a cisco/linksys E1200, so i definitely don't expect the benefits of the equipment you describe. however, arp -a on each machine outputs different values before/after B 'announces' itself to A via a ping: B lists only the router (1.1) before, but lists A (1.109) after; A is similar but can also see a couple other addresses on the subnet. any clues from that? – ericsoco Jan 04 '13 at 01:29
  • oh and -- re: correct usage of "LAN"/"subnet", the boxes *are* separated by a router (and are both connecting via wifi). am i misusing terminolgy? if so, please suggest different verbage for my original question and i can edit it. – ericsoco Jan 04 '13 at 01:42
  • Erico, based on the comment provided in response, the ARP cache on machine B, (B is the box that is unreachable until it announces itself) is has the ARP addresses of the gateway and always should unless it has just turned on before rediscovering the network. Without being at your site,it would seem one of the systems NIC's is powering down and the switch/routers inability to see it is resulting in a pruned record. That would make perfect sense unless B is in constant online use and isn't powering down, then the next cause would likely be an answer to a question, is one of the boxes in a DMZ? – SmirksWhileWalkingWCabaretGirl Jan 04 '13 at 14:49
  • oh and -- re: correct usage of "LAN"/"subnet" / This part is becoming needlessly conflicting, you describe the systems separated by a router, but the IP addressing scheme implies a shared subnet along with the fact both systems are displaying unique ARP addresses for eachother, both items contradict the presence of a router dividing the systems because the ARP addresses would be invisible to eachother the way a good router should filter such characteristics. Apple OSX for all of its flaws is more faithful and authentic in its implemention TCP than Microsoft Shintdows has ever been. – SmirksWhileWalkingWCabaretGirl Jan 04 '13 at 14:52
  • Machine B is running a program that continually accesses the internet, so I'm assuming it's keeping its NIC awake for that. Machine A doesn't lose its ability to contact B as long as it's accessing the network, so its NIC is probably falling asleep and then the ARP record gets pruned. IOW, point to @ SmirksWhileWalkingWCabaretGirl. – ericsoco Jan 04 '13 at 19:07
0

Check for a duplicate MAC address on your local network.

This sounds a lot like a network problem I had awhile ago.

mdpc
  • 11,856
  • 28
  • 53
  • 67