6

I am writing a program that would connect to different websites and request and download web-pages. I am doing this in large part to learn and properly understand web programming. I would like to know if the pointer to a linked list of type struct addrinfo returned by getaddrinfois arranged in any particular order , and if so does the ip address chosen to connect to matter in any way.

For example, if I run getaddrinfo("google.com", "http", &hints, &res), res will sometimes have up to seven internet addresses. Does it make a difference in any way if I connect to the first one or the last one? Please note that I have studied the manual pages for this function and to my understanding, my question is not answered there.

Paul Beckingham
  • 14,495
  • 5
  • 33
  • 67
FutureSci
  • 1,750
  • 1
  • 19
  • 27
  • I would guess that the order is the same as the DNS entry for the domain you looked up. Don't forget to call freeaddrinfo() when done with the result. – Yetti99 Feb 07 '15 at 15:56
  • Thanks , i do believe that u also partially answered another question that i was going to research when i progressed further. (if you're curious , it's how does getaddrinfo get this info? , i plan to read some source for libc after i'm finished learning it). – FutureSci Feb 07 '15 at 15:59
  • 1
    getaddrinfo does a DNS query. You can read about that on wikipedia. If you have a Linux command line you can run the commands dig or nslookup - they might use getaddrinfo but probably do it directly since they provide all info. Here is a site that give you access - http://network-tools.com/nslook/ – Yetti99 Feb 07 '15 at 16:06
  • 2
    According to the man page here:http://linux.die.net/man/3/getaddrinfo *"Normally, the application should try using the addresses in the order in which they are returned. The sorting function used within getaddrinfo() is defined in RFC 3484; the order can be tweaked for a particular system by editing /etc/gai.conf (available since glibc 2.5)."* This suggest studying **RFC 3484** should give you some information. https://www.ietf.org/rfc/rfc3484.txt – Galik Feb 07 '15 at 16:09
  • I saw this, but that sentence does not say why i should use the first or what difference it makes . I simply says to use the first. In my humble opinion, what if the particular program i am writing would benefit from using the last address. Also , i am currently reading that text and don't really understand much of it. i didn't ask for help here (as yet), because i'm sure i can figure it out with enough time. – FutureSci Feb 07 '15 at 16:15
  • Section 6 of **RFC 3484** https://www.ietf.org/rfc/rfc3484.txt lists 10 rules it uses to sort the addresses into an order that should be most reliable and efficient. It is well worth a read, even if you can't understand all of it. – Galik Feb 07 '15 at 16:25
  • possible duplicate of [Is it necessary to attempt to connect to all addresses returned by getaddrinfo()?](http://stackoverflow.com/questions/11572843/is-it-necessary-to-attempt-to-connect-to-all-addresses-returned-by-getaddrinfo) – edmz Feb 07 '15 at 16:53
  • @worlboss: The order returned should be optimized for your host's availability/type of IPv6 connectivity. This is what RFC 3484 is about. For example when you have IPv6 but only via tunneling, IPv4 addresses will come out first. – R.. GitHub STOP HELPING ICE Feb 07 '15 at 17:04

1 Answers1

3

Since you have multiple addrinfo structures organized in a linked list, you should iterate over it and try to connect until a connection is successful. That is:

struct addrinfo *ptr = res;

while (res != NULL) {
     int rc = connect(socket_fd, (sockaddr *) ptr->ai_addr, ptr->addr_len);
     if (rc == 0) 
         break; // we managed to connect successfully
     // handle error

This might be needed because the DNS lookup can return multiple entries, thus the need to have a linked list in order to allow you to access them. If connect succeeds, you're done; if it fails, you should keep trying for each available IP the lookup returned, so advancing the pointer to the next element. Moreover, consider that connect can fail for multiple reasons therefore you need to check errno for errors that may allow further attempts. As @R.. pointed out, you also have to pass connect a new socket as the address family may change, releasing the previous one; getaddrinfo will help you since this information is provided in the addrinfo node (ai_family).

However, this is typically unnecessary: the first result will generally work. Personally, if I may, I have never encountered the need to loop through the linked list but it's still good to know in case you might need that.

getaddrinfo(3)

There are several reasons why the linked list may have more than one addrinfo structure, including: the network host is multihomed, accessible over multiple protocols (e.g., both AF_INET and AF_INET6); or the same service is available from multiple socket types (one SOCK_STREAM address and another SOCK_DGRAM address, for example). Normally, the application should try using the addresses in the order in which they are returned. The sorting function used within getaddrinfo() is defined in RFC 3484; the order can be tweaked for a particular system by editing /etc/gai.conf (available since glibc 2.5).

edmz
  • 8,220
  • 2
  • 26
  • 45
  • 1
    Just performing `connect` in the loop is not going to work unless you either forced an address family (IPv4 or IPv6) or used `AI_V4MAPPED` to so that you can treat both the same. Otherwise you need to `close` and reopen a `socket` with the right address family for each address you try. – R.. GitHub STOP HELPING ICE Feb 07 '15 at 17:06