0

I have a requirement of monitoring network for my kubernetes cluster and I am using netlink package written in Go to do it. https://github.com/vishvananda/netlink

I am able to parse all the fields in tcpInfo type mentioned here https://github.com/vishvananda/netlink/blob/9ada19101fc5585d550e5cc0b43c28873214820a/tcp.go#L20 and send it to the timescale database and parse in Grafana.

However a major requirement for me is to display the metrics as percentage. Maybe combine a few fields mentioned in the tcpInfo struct. These are the my requirements.

  1. Delivery ratio: (ratio of packets sent/received at the other end). This looks complicated but can be extracted in grafana using db query.
  2. Packet loss rate: % of packets loss or dropped. I am not sure which metrics to use in this. I want this to contain (packets lost/packets sent after acknowledgement). I seem to have three metrics available. lost and unacked/sacked and I am not sure if these will give me the correct value because according to this article unacked should give me the right value(after acknowledgment for that frame of time) but it always return 0.
  3. Retransmission rate: % of DL layer frames retransmitted (Same problem with this. Not sure which metrics should be used to deduct this).

Here is a sample output of ss -it in my node

vjain@hk-osfebn-1298 ~]$ ss -it
..
..
ESTAB      0      0                                                                                      10.118.228.4:52388                                                                                              10.118.223.244:amqp                 
     cubic wscale:7,9 rto:201 rtt:0.132/0.012 ato:40 mss:1448 rcvmss:536 advmss:1448 cwnd:10 ssthresh:9 bytes_acked:1063335432 bytes_received:14283091 segs_out:7802018 segs_in:6550978 send 877.6Mbps lastsnd:33778 lastrcv:28454 lastack:28454 pacing_rate 1748.5Mbps retrans:0/28 rcv_rtt:88378.6 rcv_space:35246
..
..

I can't seem to understand the output of retrans or rtt. In retrans:0/28 is 0 the number of retransmits and 28 are the total number of packets sent? But I can't see any metric that has the value of 28 in the grafana dashboard using the endpoints given in the ss output. Also there seems to be no output related to lost packets but netlink package derives it from lost_out in include/linux/tcp.h https://elixir.bootlin.com/linux/latest/source/include/linux/tcp.h

kostix
  • 51,517
  • 14
  • 93
  • 176
Varun Jain
  • 63
  • 2
  • 10
  • I have no idea about how to answer the question as stated, but two random points which might or might not help you: 1) It may be easier to monitor the kernel directly via Netlink than to parse what `ss` tells; for instance, [this](https://github.com/elastic/gosigar/pull/60) covers 99% of what's needed, in plain Go; 2) `ss` is a part of the `iproute2` package which [is F/OSS](https://github.com/shemminger/iproute2)—you can find the `ss`'s code in `misc/ss.c` there; basically you can just see _what_ it formats in its output and dig deeper from there. – kostix Dec 09 '21 at 10:09
  • IOW, it's not a question about Go, k8s and telegraf but rather about making sense of what Linux exposes from its networking subsystem. – kostix Dec 09 '21 at 10:09

0 Answers0