0

I am running two pods in Kubernetes, pod A sends a connection request to pod B, however second pod response with tcp RST. Is there a way to guess from the Wireshark capture to see why reset happened ? Any area where I need to look for this issue if we can not completely conclude from below output also will be fine.

Below is the Wireshark capture of RST pkt. This is captured from inside the client pod (A)

POD A (10.244.0.109)-- service A (10.103.61.120) -----------tcp channel --------- service B (10.111.125.227) -- POD B (10.244.0.133)

Above is the setup diagram.

        43781   2023-08-24 07:05:17.182965  0.000032    10.111.125.227  10.244.0.109    TCP 56  64  4560 → 39868 [RST] Seq=1 Win=0 Len=0

        Frame 43781: 56 bytes on wire (448 bits), 56 bytes captured (448 bits)
        Encapsulation type: Linux cooked-mode capture v1 (25)
        Arrival Time: Aug 24, 2023 12:35:17.182965000 India Standard Time
        [Time shift for this packet: 0.000000000 seconds]
        Epoch Time: 1692860717.182965000 seconds
        [Time delta from previous captured frame: 0.000032000 seconds]
        [Time delta from previous displayed frame: 0.000032000 seconds]
        [Time since reference or first frame: 1866.143300000 seconds]
        Frame Number: 43781
        Frame Length: 56 bytes (448 bits)
        Capture Length: 56 bytes (448 bits)
        [Frame is marked: False]
        [Frame is ignored: False]
        [Protocols in frame: sll:ethertype:ip:tcp]
        [Coloring Rule Name: TCP RST]
        [Coloring Rule String: tcp.flags.reset eq 1]
    Linux cooked capture v1
        Packet type: Unicast to us (0)
        Link-layer address type: Ethernet (1)
        Link-layer address length: 6
        Source: ba:72:a7:1d:e4:65 (ba:72:a7:1d:e4:65)
        Unused: 0000
        Protocol: IPv4 (0x0800)
    Internet Protocol Version 4, Src: 10.111.125.227, Dst: 10.244.0.109
        0100 .... = Version: 4
        .... 0101 = Header Length: 20 bytes (5)
        Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
            0000 00.. = Differentiated Services Codepoint: Default (0)
            .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
        Total Length: 40
        Identification: 0x0000 (0)
        010. .... = Flags: 0x2, Don't fragment
            0... .... = Reserved bit: Not set
            .1.. .... = Don't fragment: Set
            ..0. .... = More fragments: Not set
        ...0 0000 0000 0000 = Fragment Offset: 0
        Time to Live: 64
        Protocol: TCP (6)
        Header Checksum: 0xa71d [validation disabled]
        [Header checksum status: Unverified]
        Source Address: 10.111.125.227
        Destination Address: 10.244.0.109
    Transmission Control Protocol, Src Port: 4560, Dst Port: 39868, Seq: 1, Len: 0
        Source Port: 4560
        Destination Port: 39868
        [Stream index: 379]
        [Conversation completeness: Incomplete (40)]
        [TCP Segment Len: 0]
        Sequence Number: 1    (relative sequence number)
        Sequence Number (raw): 937229753
        [Next Sequence Number: 1    (relative sequence number)]
        Acknowledgment Number: 0
        Acknowledgment number (raw): 0
        0101 .... = Header Length: 20 bytes (5)
        Flags: 0x004 (RST)
            000. .... .... = Reserved: Not set
            ...0 .... .... = Accurate ECN: Not set
            .... 0... .... = Congestion Window Reduced: Not set
            .... .0.. .... = ECN-Echo: Not set
            .... ..0. .... = Urgent: Not set
            .... ...0 .... = Acknowledgment: Not set
            .... .... 0... = Push: Not set
            .... .... .1.. = Reset: Set
                [Expert Info (Warning/Sequence): Connection reset (RST)]
                    [Connection reset (RST)]
                    [Severity level: Warning]
                    [Group: Sequence]
            .... .... ..0. = Syn: Not set
            .... .... ...0 = Fin: Not set
            [TCP Flags: ·········R··]
        Window: 0
        [Calculated window size: 0]
        [Window size scaling factor: -1 (unknown)]
        Checksum: 0x390b [unverified]
        [Checksum Status: Unverified]
        Urgent Pointer: 0
        [Timestamps]
            [Time since first frame in this TCP stream: 0.000032000 seconds]
            [Time since previous frame in this TCP stream: 0.000032000 seconds]

wireshark output client side

wireshark output server side

  • It is impossible to say from a single RST packet alone why it happend. Maybe it would be possible to speculate if preceding traffic is known (i.e. more context) instead of only this single packet. Maybe the logs of the application which produced the RST are helpful. Maybe the full setup is needed to reproduce the problem. – Steffen Ullrich Aug 24 '23 at 15:54
  • application log is not showing anything why reset happened. May be who has written the opensource application forgot to add the log where ideally it should be. so I guess in a application C program with TCP non-RAW socket only close() can cause RST or FIN. so in this case should I add logs around connection closing to debug the issue at application level? – myquest9 sh Aug 24 '23 at 16:01
  • Again, there is absolutely nothing here known so far except "a RST packet was sent". No context of the preceding packets are known, no context of setup and applications involved, no information on how it can be reproduced. Every idea why the RST was sent is thus pure speculation based on basically zero information. This also means any recommendations on how to debug it also. – Steffen Ullrich Aug 24 '23 at 16:07
  • yes you are right. Instead of posting everything here I wanted to know how can we further in such cases. say if this is a C application program which is causing this issue. what are the reason that a C socket program cause this so that i can add log around that to debug more. any way for at least traffic level snap i have posted in main thread. please have a look – myquest9 sh Aug 24 '23 at 16:26
  • Still too few context even to answer this reduced question. A single packet is simply not sufficient here. – Steffen Ullrich Aug 24 '23 at 16:34
  • I have added a snap in main thread with name "wireshak output". Can it provide some clue. or should i add entire pcap file? – myquest9 sh Aug 24 '23 at 17:47
  • Much more useful now. Given that the RST comes after the client (10.244.0.109) has sent data after nearly an hour of inactivity I'm speculating that this is not a problem of the application itself but that the stateful (conntrack) firewall between the pods has lost the state for the connection due to idle timeout. So it is not the application generating the RST but the firewall. I suspect that you captured the traffic at the client side and you likely will not see the RST at the application side. Fix would be to enable TCP keep-alive either at system or application level with a short timer – Steffen Ullrich Aug 24 '23 at 18:26
  • _[RFC 9293,Transmission Control Protocol (TCP), 3.5.2, Reset Generation](https://datatracker.ietf.org/doc/html/rfc9293#section-3.5.2)_ explains it. – Ron Maupin Aug 25 '23 at 01:44
  • @SteffenUllrich I do not have any firewall as this is a pod to pod communication within same k8s cluster. does stateful (conntrack) firewall still comes into picture specially in this case? – myquest9 sh Aug 25 '23 at 05:43
  • @myquest9sh: there is NAT involved which needs state tracking which uses conntrack. Again, do a packet capture not only on the site where you get the RST but also at the peer and check if it gets actually generated there or (my speculation) gets generated instead by conntrack in between the PODs – Steffen Ullrich Aug 25 '23 at 05:54
  • @SteffenUllrich what RST packt capture from source look like. I know i can capture on destination side in wireshark using filter tcp.flags.reset==1 does this filter capture on source side as well. i just wanted to know how pkt will look in src so that i can identify – myquest9 sh Aug 25 '23 at 06:58
  • @myquest9sh: a RST is a RST, it does not look any different apart from probably the IP address and maybe port when NAT is done. But don't just watch for the RST - watch for the connection instead and see if the same connection which got a RST on the client als got a RST on the server. – Steffen Ullrich Aug 25 '23 at 07:43
  • @SteffenUllrich I do not see any RST pkt from server end particular to that client and port. I have attached the wireshark snap from server as well in main thread. can you please have a look. does this confirm this is a firewall issue as you guessed earlier? In the diagram i have mentioned the both pos IPs and svc IPs also. just to refer in wireshark. – myquest9 sh Aug 25 '23 at 08:32
  • @myquest9sh: not having a RST on the server side supports my theory that this is about conntrack states. Note though that the two wireshark pictures you provide are for different connections - as can easily be seen on the size of the transferred data. So to be sure you need to do a packet capture at the same time on client and server to observe the difference for the same connection. – Steffen Ullrich Aug 25 '23 at 08:37
  • @SteffenUllrich sorry for the confusion. my intention was just to show that there is only on RST pkt from server side and that is for something else not for the connection which we are taking about. more over i can not capture same connection from server as in client pcap we see connection between client pod ip --- server svc ip and on server side we see server pod ip ---- client svc ip. there is always a service to communicate between the two pods hence they can not see each other and can not be captured as a direct communication. – myquest9 sh Aug 25 '23 at 08:57

0 Answers0