2

I'm not an admin but since our regular guy is on vacation, the problem ended up in my lap. I'll be as brief as I can.

We noticed that our SQL Server 2005 instance was acting weird: app starts, app fails to connect to db. However, app works just fine after restart. Same goes for SQL Server Managemetn Studio. This behavior has been observed on several network machines so it's probably not a client issue. At the same time, using the server's IP address works all the time, which to me as a novice looks like a name resolution issue.

Pinging the server by name results in a Destination host unreachable on the first try and successful pings on subsequent tries. After waiting an indeterminate time, this same cycle repeats iself. Again, pinging the server's IP works flawlessly.

Event Viewer contains Errors 4004 and 4015 in the DNS section. Attempts to fix them using Google have so far been unsuccessful.

Question: is there a simple fix?

Update

I managed to eliminate Error 4004 by reinstalling the DNS service, although Error 4015 is still present.

Another interesting thing I noticed related to that failed first ping:

Pinging oxyserver [169.254.2.62] with 32 bytes of data:
Reply from 169.254.74.29: Destination host unreachable.
Reply from 169.254.74.29: Destination host unreachable.

I have no idea how it came up with this IP address (169.254.2.62), because right after that, ping correctly gets the server's IP address and it works just fine:

Pinging oxyserver [192.168.1.201] with 32 bytes of data:
Reply from 192.168.1.201: bytes=32 time<1ms TTL=128
Reply from 192.168.1.201: bytes=32 time<1ms TTL=128

Update2

As requested, the results of dnscmd /info

Query result:
Server info
        server name              = oxyserver.Oxy.loc
        version                  = 0ECE0205 (5.2 build 3790)
        DS container             = cn=MicrosoftDNS,cn=System,DC=Oxy,DC=loc
        forest name              = Oxy.loc
        domain name              = Oxy.loc
        builtin domain partition = ForestDnsZones.Oxy.loc
        builtin forest partition = DomainDnsZones.Oxy.loc
        last scavenge cycle      = not since restart (0)
  Configuration:
        dwLogLevel               = 00000000
        dwDebugLevel             = 00000000
        dwRpcProtocol            = FFFFFFFF
        dwNameCheckFlag          = 00000002
        cAddressAnswerLimit      = 0
        dwRecursionRetry         = 3
        dwRecursionTimeout       = 15
        dwDsPollingInterval      = 180
  Configuration Flags:
        fBootMethod                  = 3
        fAdminConfigured             = 0
        fAllowUpdate                 = 1
        fDsAvailable                 = 1
        fAutoReverseZones            = 1
        fAutoCacheUpdate             = 0
        fSlave                       = 0
        fNoRecursion                 = 0
        fRoundRobin                  = 1
        fStrictFileParsing           = 0
        fLooseWildcarding            = 0
        fBindSecondaries             = 1
        fWriteAuthorityNs            = 0
        fLocalNetPriority            = 1
  Aging Configuration:
        ScavengingInterval           = 0
        DefaultAgingState            = 0
        DefaultRefreshInterval       = 168
        DefaultNoRefreshInterval     = 168
  ServerAddresses:
 Addr Count = 2
                Addr[0] => 192.168.1.201
                Addr[1] => 169.254.2.62
  ListenAddresses:
        NULL IP Array.
  Forwarders:
        NULL IP Array.
        forward timeout  = 5
        slave            = 0
Command completed successfully.

The two addresses are an obvious red flag.

Changing the priority of the NICs in Network Connections/Advanced appears to have gotten rid of error 4015. However, the original problem still exists.

dandan78
  • 121
  • 6
  • 169.254.x.x indicates an APIPA address, it means your computers are losing connection to the network and windows DHCP is assigning a "OH CRAP" (generally useless) network address. So computers getting that will be able to talk to eachother ... but nothing else. – Daniel B. Jul 08 '11 at 12:32
  • Why the nameserver is providing that address ... I don't know. – Daniel B. Jul 08 '11 at 12:34
  • Perhaps I should've mentioned that we have a router that is doing DHCP. Does that mean I should disable the DHCP service on the server machine? – dandan78 Jul 08 '11 at 12:36
  • No ... the router providing DHCP is fine. APIPA is what happens when a host doesn't get a response from a DHCP server, it's windows' way of creating a little pocket network when the correct one isn't resolving itself. – Daniel B. Jul 08 '11 at 12:45
  • 1
    Wait, you have DHCP coming from both your router and your server? That in itself is odd, but it shouldn't be causing this issue. Might want to ask your admin why he's got DHCP set up like that. – Daniel B. Jul 08 '11 at 12:51
  • Since you're having DHCP coming from two different locations, that could cause a whole host of issues if they are not pointing to the same settings. What are your DNS servers set on both the router and your server (primary and secondary). I would also check the DNS forwarders to see if you can see if those are setup correctly. I would work on getting down to 1 DHCP server, or if there is a business requirement for having two, make sure they are not over lapping, which could also cause duplicate issues of ARP as already mentioned. – Nixphoe Jul 08 '11 at 22:51

4 Answers4

2

Given the APIPA address it's giving you on the first try I'm inclined to think your nameserver is corrupt, or has bad DNS records somewhere. Check the records for the hosts that are returning the wrong address.

Try this: open a command prompt. Type ipconfig /flushdns. Now try to ping the server and see what you get.

Daniel B.
  • 725
  • 7
  • 16
2

How many NICs does your server have? I've seen this error come up at my workplace when [somehow] a different NIC was set to a higher priority than the primary NIC.

Edit Can you check your ARP cache? I think Daniel's onto something with the APIPA.

Open a command prompt and type arp -a and post the output please.

Daniel B.
  • 725
  • 7
  • 16
Mountainerd
  • 306
  • 2
  • 12
  • That could be it ... I've seen it mentioned in some of the forums I've been looking at. – Daniel B. Jul 08 '11 at 13:08
  • Well, there are two of those network connection icons in the system tray and one of them is disconnected. I changed their priority earlier today after reading about that on some forum, but it doesn't appear to have helped. As for your second question, although I have no idea what you're asking me, I can respond if you tell me where to look. :) – dandan78 Jul 08 '11 at 13:10
  • Let's try something a bit easier that'll provide information. dnscmd /info Can you post that (you can make the domain info generic, if you prefer)? – Mountainerd Jul 08 '11 at 13:15
  • Doesn't he have to be in an integrated zone to be getting the event ID 4015? – Daniel B. Jul 08 '11 at 13:20
  • @dan the latest reboot seems to have eliminated 4015. It was probably that NIC priority that I messed with. – dandan78 Jul 08 '11 at 13:22
  • @Daniel Good sir, I wish I had thought about that question before I added it. – Mountainerd Jul 08 '11 at 13:26
  • @dandan Excellent! Glad it works for you now. – Mountainerd Jul 08 '11 at 13:27
  • @cz Wait, you might've misunderstood me. :) 4015 is gone, but I still have the name resolution issue. Looks like they had nothing to do with each other. – dandan78 Jul 08 '11 at 13:28
  • Oh. Well, hm. Can check your ARP entries to see if any of them point to the disabled MAC? (arp -a) You'll need to know the MAC of the disabled NIC, too. ARP is uses MAC addresses, so it may still be trying to flow that direction. Also, flushing your DNS (ipconfig /flushdns) may be beneficial as well. – Mountainerd Jul 08 '11 at 13:40
  • ARP tables are flushed frequently on their own. They would also have been cleared when he rebooted. – Daniel B. Jul 08 '11 at 14:14
  • Beyond that, ARP is IP to MAC resolution, it wouldn't even come up in name resolution. – Daniel B. Jul 08 '11 at 14:23
  • I posted the solution in a separate answer. Thanks, Josh and Daniel, for your help. Don't really know who to award the solution to. – dandan78 Jul 08 '11 at 14:28
  • Give it to Josh ;) – Daniel B. Jul 08 '11 at 14:59
1

"Pinging the server by name results in a Destination host unreachable on the first try"

Maybe a "farther reach" here, but you might take a look at the power management settings on the server's NIC. It normally should not get changed after everything is setup and running.

user48838
  • 7,431
  • 2
  • 18
  • 14
  • The NIS was set to `Allow the computer to turn off this device to save power`. Unckecked it. Problem persists. – dandan78 Jul 08 '11 at 09:49
  • Hmmm... A little bit of a "yellow flag" as most servers should not have such settings enabled. You might also consider checking the BIOS settings as well and possibly its overall power settings (i.e. is the server also set to "Sleep" on non-activity as well?). – user48838 Jul 08 '11 at 09:59
  • I thought so too, but the server definitely is not sleeping and I'm pretty sure the NIC setting doesn't actually affect anything because I've been connected to the server via Remote Desktop for the last two days so that I can keep track of things. Other than that, the power scheme is Always On. – dandan78 Jul 08 '11 at 10:04
  • RDP traffic is not continuous, it can/will back off if there is no activity. That aside, you might check at the next point - possibly the directly connected switch. Some of the newer consumer-grade switches are coming out with "Green" capabilities (which may not be suitable for hosting/interconnecting servers). – user48838 Jul 08 '11 at 10:09
  • Check out my update. I think it's significant because it shows that the first ping fails because the name->ip address translation is wrong. – dandan78 Jul 08 '11 at 12:29
  • "TAP-Win32 Adapter v9" appears to be a pseudo definition for an OpenVPN configuration. Should the server have such a configuration? It seems it may have been answering the ARP first and then possibly figuring out it is not currently active with a live session, stops responding and then another round of ARP takes place (due to no further responses for resolution) where the physical NIC can then answer correctly. – user48838 Jul 08 '11 at 18:02
0

I managed to resolve the problem by disabling the second network adapter (TAP-Win32 Adapter v9), which was listed as disconnected, but was being assigned an IP address for some reason.

I appreciate everybody's help. What led me to the solution was Josh's suggestion to run arp -a. When that didn't return anything for the 169.x.x.x address, I got an urge to run ipconfig /all, which ended up listing the 169.x.x.x address next to that other NIC. A simple disable followed by a reboot fixed it.

dandan78
  • 121
  • 6
  • That still doesn't make sense to me ... I suppose if the ... oh. Hm. Maybe AD was giving a bad IP and disabling it and rebooting fixed it. Man. You find the best problems :p – Daniel B. Jul 08 '11 at 14:38
  • That second NIC, which is a total mystery to me because there's only one Ethernet port on the server, wasn't disabled; it was just listed as disconnected (edited post to clarify that). The actual disabling appears to have done the trick, but I don't quite understand why changing the priority of the NICs didn't do that before. Weird. – dandan78 Jul 08 '11 at 14:44
  • Your solution looks like it might be a stopgap. Go into the DHCP service on your AD computer, check scope options and ensure that DNS is ONLY giving the correct IP for your DNS server. I think what happened is you have APIPA addresses stored in host WINS caches, and when they couldn't contact the name server, they bumped to WINS, which gave them the wrong address. Just a hunch, it's still weird that it only happened on the first attempt. – Daniel B. Jul 08 '11 at 14:58