2

I'm trying to call getaddrinfo from Python, through ctypes / libc, on Mac OS, in order to find the IP address of a domain.

The call appears to succeed: no error code is returned, and ai_addrlen is set to 28, which I understand is the appropriate length for an IPv6 address. However, ai_addr appears to be a null pointer, and I'm not sure how to begin to debug it.

How can I find the IP address of a domain using libc.getaddrinfo ?

from ctypes import (
    byref,
    c_char, c_char_p, c_int, c_size_t, c_void_p,
    CDLL,
    POINTER,
    pointer,
    Structure,
)

libc = CDLL(None)

class c_addrinfo(Structure):
    pass

c_addrinfo._fields_ = [
    ('ai_flags', c_int),
    ('ai_family', c_int),
    ('ai_socktype', c_int),
    ('ai_protocol', c_int),
    ('ai_addrlen', c_size_t),
    ('ai_addr', c_void_p),
    ('ai_canonname', c_char_p),
    ('ai_next', POINTER(c_addrinfo)),
]

c_addrinfo_p = POINTER(c_addrinfo)
result = c_addrinfo_p()
error = libc.getaddrinfo(
    c_char_p(b'www.google.com'),
    None,
    None,
    byref(result),
)

print(error)                          # 0
print(result.contents.ai_canonname)   # b'\x1c\x1e
print(result.contents.ai_addrlen)     # 28
print(bool(result.contents.ai_addr))  # False === null pointer

libc.freeaddrinfo(result)
Michal Charemza
  • 25,940
  • 14
  • 98
  • 165
  • Is there a reason `socket.getaddrinfo` won't work for you? – zwol Dec 01 '18 at 21:56
  • @zwol In terms of getting the IP address: none as far as I know. However, I am looking to get more familiarity with making syscalls from Python, and this seemed like reasonable place to start. – Michal Charemza Dec 01 '18 at 22:00
  • getaddrinfo is not a "syscall" in the sense that term is used by C programmers. It's one of the more complicated routines in the C library and I do not think it is a good place to start with `ctypes`. This doesn't make the question a bad one taken strictly on its own terms, though. – zwol Dec 01 '18 at 22:59
  • Addressing the question on its own terms, when I run this program, the output I get is `0`, `None`, `16`, and `True` (on four lines). So I don't know why it's not working for you. – zwol Dec 01 '18 at 23:00
  • @zwol Ah understood about it not being a syscall. What os are you running on? – Michal Charemza Dec 01 '18 at 23:02
  • The above was Linux. I just tried it on a NetBSD box (which is much more similar to MacOS) and there I get the same output you do, including the `b'\x1c\x1e'` for `ai_canonname` and 28 for `ai_addrlen`. Unfortunately I haven't the foggiest idea why it should do this, or why the behavior should vary between BSD and GNU C libraries. – zwol Dec 01 '18 at 23:17
  • At this point I must apologize for not being able to actually help, and leave this question for someone else. – zwol Dec 01 '18 at 23:18
  • @zwol Ah but thank you! Very good to know about the differences, even if so far _why_ there are differences isn’t known. – Michal Charemza Dec 01 '18 at 23:19
  • @zwol If you're interested, I think I've found the source of the difference https://stackoverflow.com/a/53584085/1319998 – Michal Charemza Dec 02 '18 at 20:22

1 Answers1

1

According to the linux man page for getaddrinfo the addrinfo struct which results form getaddrinfo are stored is defined as

struct addrinfo {
    int              ai_flags;
    int              ai_family;
    int              ai_socktype;
    int              ai_protocol;
    socklen_t        ai_addrlen;
    struct sockaddr *ai_addr;
    char            *ai_canonname;
    struct addrinfo *ai_next;
};

and according to the FreeBSD man page for getaddrinfo (or one of Apple's man pages for getaddrinfo which is similar), its addrinfo looks the same, assuming all the types match up.

struct addrinfo {
     int ai_flags;             /* input flags */
     int ai_family;            /* address family for socket */
     int ai_socktype;          /* socket type */
     int ai_protocol;          /* protocol for socket */
     socklen_t ai_addrlen;     /* length of socket-address */
     struct sockaddr *ai_addr; /* socket-address for socket */
     char *ai_canonname;       /* canonical name for service location */
     struct addrinfo *ai_next; /* pointer to next in list */
};

However looking in the FreeBSD source (or one of the open source Apple projects which is similar), we see a subtly different definition:

struct addrinfo {
    int ai_flags;             /* AI_PASSIVE, AI_CANONNAME, AI_NUMERICHOST */
    int ai_family;            /* AF_xxx */
    int ai_socktype;          /* SOCK_xxx */
    int ai_protocol;          /* 0 or IPPROTO_xxx for IPv4 and IPv6 */
    socklen_t ai_addrlen;     /* length of ai_addr */
    char *ai_canonname;       /* canonical name for hostname */
    struct sockaddr *ai_addr; /* binary address */
    struct addrinfo *ai_next; /* next structure in linked list */
};

It's very easy to miss, but ai_canonname and ai_addr are the other way around to how they are documented. This means that the Python ctypes definition, for Mac(/similar) should be

class c_addrinfo(Structure):
    pass

c_addrinfo._fields_ = [
    ('ai_flags', c_int),
    ('ai_family', c_int),
    ('ai_socktype', c_int),
    ('ai_protocol', c_int),
    ('ai_addrlen', c_size_t),
    ('ai_canonname', c_char_p),
    ('ai_addr', c_void_p),
    ('ai_next', POINTER(c_addrinfo)),
]

or one that works on both Mac and Linux (and with no comment on other platforms)

import platform

c_addrinfo._fields_ = [
    ('ai_flags', c_int),
    ('ai_family', c_int),
    ('ai_socktype', c_int),
    ('ai_protocol', c_int),
    ('ai_addrlen', c_size_t),
] + ([
    ('ai_canonname', c_char_p),
    ('ai_addr', c_void_p),
] if platform.system() == 'Darwin' else [
    ('ai_addr', c_void_p),
    ('ai_canonname', c_char_p),
]) + [
    ('ai_next', POINTER(c_addrinfo)),
]

And with these versions, on Mac, the pointer ai_addr is no longer null. You can also see an early/experimental version that parses the addresses themselves that works in both Mac and Linux.

Edit: it looks like the documentation issue has already been reported to FreeBSD

Michal Charemza
  • 25,940
  • 14
  • 98
  • 165
  • 1
    Nice catch! That neatly explains both why `ai_addr` was coming out as NULL, and why `ai_canonname` appeared to be garbage -- it was interpreting the first few bytes of the address as a string! I can confirm that NetBSD uses the same ordering as Darwin and FreeBSD, and that its `getaddrinfo(3)` manpage has the same inconsistency. – zwol Dec 02 '18 at 20:54