How does a web browser parse the raw bytes of each part of a DNS name to a human readable form?

Question

My focus is to parse the structure of a DNS packet programmatically using python. Consider the snap where my browser send a large domain name through the pipe without even complaining.

Let's take a close look at the byte string: b'\x03www99dksjfhkdfgh534534534dkfghkldfhglksdfhg435634634dkfghlkd\x03com\x00'

I can programmatically write my codes in python with a logic like this

So my question is: How does a web browser parse the raw bytes of each part of a DNS name to a human readable form? Any form of advice/help is welcomed.

Programming questions should be asked on [so]. In the meantime you should read RFC 1034 and RFC 1035. — Michael Hampton, Jun 27 '20 at 15:44
I've read RFC 1035 before posting..the "Name space definitions" section dealt with the octet lengths of each domain parts..I haven't found the parsing logic though.. — repzero, Jun 27 '20 at 15:55
@repzero If you had already read about the length prefix means, can you clarify what specifically is unclear? — Håkan Lindqvist, Jun 27 '20 at 16:30
@HåkanLindqvist Sure..look at the last snap. I'm looping through each character to find the beginning of a DNS label/part. I'm looking for a non ASCII octet length/count). When the loop hits the octet length/count, it stops and save that part of domain name as a label/part. The issue is, if the label count number is 57 or more, It is being displayed as an ASCII character. Look at the domain name, it starts with "9" and the label count in ASCII representation is "9". So, how can one know which one of the "9" is the label count?. Specifically, what is the parsing logic a browser use? — repzero, Jun 27 '20 at 16:46
@repzero You look at the first byte (length specifier), then read as many bytes as it says, then you look at the next byte (next length specifier) and read as many bytes as it says. At no point do you have to care about if something is a printable character. — Håkan Lindqvist, Jun 27 '20 at 16:51
@HåkanLindqvist I had the same idea, but what if you have a domain name of b"ww3.99autosales......com" what is the first byte specifier? 3 or the first 9 or the second 9?. My browser did send a dns query for such a name..and I suppose the name server parsed it correctly using some logic. — repzero, Jun 27 '20 at 16:55
@repzero The first byte is the length of the first label, then after as many bytes as that said, there is a new label (where the first byte again specifies the length). Look at the example in my answer. — Håkan Lindqvist, Jun 27 '20 at 17:01
@HåkanLindqvist hmm..I see your point...so, instead of looping from the end, loop from the start and count the number of character specified in the byte specifier and when the count end, consider the next byte a byte specifier!. Is this what your implying? If so, can you add this in your answer, so someone who's looking for the logic can easily understand? — repzero, Jun 27 '20 at 17:10
@repzero Yes, definitely read the value from the start. Otherwise you cannot know the lengths of the labels (which specifies where to stop). — Håkan Lindqvist, Jun 27 '20 at 17:12
@HåkanLindqvist Can you add this in the answer, so that I can mark I can accept it? — repzero, Jun 27 '20 at 17:14

score 2 · Accepted Answer · edited Oct 07 '21 at 07:59

The wire-format of names in the DNS protocol is (by example):

The name www.example.com. splits into four labels:

"www", "example", "com" and "" (there's always the empty label at the end, representing the root of the tree).

Each label is encoded with a prefix specifying the length of the label (single byte, with first two bits reserved, resulting in a six bit integer), followed by the raw contents of the label (as many bytes as the prefix specified).

The example name above is \x03www\x07example\x03com\x00 (if we use the normal \xNN format for values of bytes that are not necessarily readable, with the value in hexadecimal).

When reading a name in wire-format, you would start at the first byte, look at how long it specifies that the next label is, read that many bytes as that label, repeating this process until you get to the zero-length label that represents the root of the tree (and end of the name).

In a packet, however, there is also the possibility of compressing names (by referring to previous labels).

Unless the purpose specifically is learning, you might want to look at eg Dnspython rather than starting from scratch.

How does a web browser parse the raw bytes of each part of a DNS name to a human readable form?

1 Answers1