Why the bytes stream got by python socket.recvfrom is different from that crawled by WireShark?

Question

I used the python socket to send a DNS query packet socket and listen to the response. Finally, I got a DNS response packet by the socket.recvfrom(2048) function as expected. But strangely, where I compared the response packet with the packet crawled by Wireshark, I found there exists many difference.

The differences would be found as 3f at the second picture.

The DNS response packet (The highlighted part) crawled by the Wireshark

The DNS response packet got by the socket.recvfrom(2048)

The Creating a Socket Part Codes:

    ipv = check_ip(dst)
    udp = socket.getprotobyname(Proto.UDP)
    if ipv == IPV.ERROR:
        return None
    elif ipv == IPV.IPV4:
        return socket.socket(socket.AF_INET, socket.SOCK_DGRAM, udp)
    elif ipv == IPV.IPV6:
        return socket.socket(socket.AF_INET6, socket.SOCK_DGRAM, udp)
    else:
        return None

The Receiving a DNS response packet Part Codes:

    remained_time = 0
    while True:
        remained_time = self.timeout - timeit.default_timer() + sent_time
        readable = select.select([sock], [], [], remained_time)[0]
        if len(readable) == 0:
            return (-1, None)

        packet, addr = sock.recvfrom(4096)

Please post the text, not pictures of the text. And tell us what part of the difference you found interesting. They look about the same to me. — John Zwinck, Sep 12 '18 at 05:55
No, I mean the byte stream itself. Compare the highlighted part with the bytes in the second picture . The differences would be found as `3f` in the second picture. — Tree Simith, Sep 12 '18 at 06:24

Remy Lebeau · Accepted Answer · 2018-09-12T13:40:50.420

2

Byte 0x3F is the ASCII '?' character. That commonly means the data is being treated as text and is passing through a charset conversion that doesn't support the bytes being converted.

Notice that 0x3F is replacing only the bytes that are > 0x7F (the last byte supported by ASCII). Non-ASCII bytes in the range of 0x80-0xFF are subject to charset interpretation.

That makes sense, as you are using the version of recvfrom() that returns a string, so the received bytes need to be converted to Python's default string encoding.

Since you need raw bytes instead, use recvfrom_into() to fill a pre-allocated bytearray, eg:

packet = bytearray(4096)
remained_time = 0
while True:
    remained_time = self.timeout - timeit.default_timer() + sent_time
    readable = select.select([sock], [], [], remained_time)[0]
    if len(readable) == 0:
        return (-1, None)
    nbytes, addr = sock.recvfrom_into(packet)

Then you can use packet up to nbytes number of bytes as needed.

edited Sep 12 '18 at 13:40

answered Sep 12 '18 at 07:05

Remy Lebeau

555,201
31
458
770

Thanks a lot! There still seem to be some bytes which are lower than `0x7f` and translated into `0x3f`. For example, `shifen·com` in the first picture were translated into `shifen?·?+` in the second picture. – Tree Simith Sep 12 '18 at 12:58
@TreeSimith that goes back to my point about "passing through a charset conversion that **doesn't support the bytes being converted**. Whatever charset was being used to generate the `string` clearly doesn't define any characters for bytes `0x6E` and `0x63` in the context they appeared. You will also notice that the `string` contains a `0x16` character that isn't in the source bytes. Your data is getting corrupted when converted to `string`, so get rid of the conversion. – Remy Lebeau Sep 12 '18 at 13:44

Why the bytes stream got by python socket.recvfrom is different from that crawled by WireShark?

1 Answers1