3

I'm testing out a error path that requires me to drop a request from getaddrinfo. I set up 2 VMs:

  • RHEL 7.9
  • Ubuntu 20

The code is the same on both machines, just a call to getaddrinfo for test.com. I blocked all incoming packets to simulate a request of getaddrinfo getting dropped, however in the exact same scenario, the 2 OSes perform differently.

  • RHEL times out after 12 seconds with an error EAI_NONAME (No such file or directory)
  • Ubunutu times out after 20 seconds with an error EAI_AGAIN (Resource temporarily unavailable)

So my 2 questions are:

  • Why do these give 2 different errors?
  • Why are the timeouts different and where are they defined? I tried to look at the linux source but couldn't figure this out

Code:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <netdb.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <arpa/inet.h>

int main (void)
{
  struct addrinfo hints, *res, *result;
  int errcode;
  char addrstr[100];
  void *ptr;

  memset (&hints, 0, sizeof (hints));
  hints.ai_family = PF_UNSPEC;
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_flags |= AI_CANONNAME;

  errcode = getaddrinfo ("test.com", NULL, &hints, &result);
  if (errcode != 0)
  {
      perror ("getaddrinfo");
      return -1;
  }
  
  res = result;

  while (res)
    {
      inet_ntop (res->ai_family, res->ai_addr->sa_data, addrstr, 100);

      switch (res->ai_family)
        {
        case AF_INET:
          ptr = &((struct sockaddr_in *) res->ai_addr)->sin_addr;
          break;
        case AF_INET6:
          ptr = &((struct sockaddr_in6 *) res->ai_addr)->sin6_addr;
          break;
        }
      inet_ntop (res->ai_family, ptr, addrstr, 100);
      printf ("IPv%d address: %s (%s)\n", res->ai_family == PF_INET6 ? 6 : 4,
              addrstr, res->ai_canonname);
      res = res->ai_next;
    }
  
  freeaddrinfo(result);
  return 0;
}

Compiled with:

gcc test.c

RHEL resolv.conf:

search ht.home
nameserver 192.168.0.1
nameserver [IPV6 address 1]
nameserver [IPV6 address 2]

Ubuntu:

nameserver 127.0.0.53
options edns0 trust-ad
search ht.home
TreeWater
  • 761
  • 6
  • 13
  • Post the code, and the compilation arguments. – Andrew Henle Mar 04 '21 at 23:09
  • @AndrewHenle posted. Didn't originally include it since both were the same program and compiled the same way so didn't think it was relevant – TreeWater Mar 04 '21 at 23:14
  • Does `resolv.conf` look the same in both environments? – larsks Mar 04 '21 at 23:23
  • @larsks no they are different. I'll post in the description – TreeWater Mar 05 '21 at 00:03
  • I would chalk it up to the fact that you are talking to different resolvers: on your Ubuntu system,you're talking to a local `systemd-resolved` instance, while on the RHEL system you're talking to whatever is running on 192.168.0.1. It's highly like the two resolvers respond differently. Does the behavior of your code change if you modify the ubuntu resolv.conf to look like the RHEL one? – larsks Mar 05 '21 at 00:19
  • Unfortunately changing the resolv.conf doesn't change anything :( – TreeWater Mar 05 '21 at 00:43
  • Do you possibly have different nss configurations in `/etc/nsswitch.conf`? – R.. GitHub STOP HELPING ICE Mar 05 '21 at 15:36
  • In any case, to track this down you want to use `strace` and `tcpdump` to get a clear picture of the sequence of syscalls and packets traveling. – R.. GitHub STOP HELPING ICE Mar 05 '21 at 15:37

1 Answers1

0

The Ubuntu behavior here is correct and the RHEL one is wrong - the result is inconclusive since it was both unable to get an address for the name and unable to get a response testifying to the nonexistence of the name.

The mechanism is probably a mix of glibc bugs (rather, intentional inconsistent behavior) and the difference between the RHEL configuration with a remote nameserver you've blocked, and the Ubuntu configuration proxied through systemd-resolved (which maybe you haven't blocked, instead only blocking it from making outgoing queries to the real network?). You could confirm the differences here by running your test program under strace and watching tcpdump both on the loopback and real network interfaces.

Basically, under some conditions, glibc treats errors the same as nonexistence of the name, while under others, it treats them as a reportable failure. If you're able to query the local systemd-resolved, it will return a ServFail error code because it can't get a result or cryptographic proof of nonexistence from the upstream nameservers, and glibc probably reports this, but doesn't report its own failure to contact the nameserver.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711