I have a requirement of monitoring network for my kubernetes cluster and I am using netlink package written in Go to do it. https://github.com/vishvananda/netlink
I am able to parse all the fields in tcpInfo type mentioned here https://github.com/vishvananda/netlink/blob/9ada19101fc5585d550e5cc0b43c28873214820a/tcp.go#L20 and send it to the timescale database and parse in Grafana.
However a major requirement for me is to display the metrics as percentage. Maybe combine a few fields mentioned in the tcpInfo struct. These are the my requirements.
- Delivery ratio: (ratio of packets sent/received at the other end). This looks complicated but can be extracted in grafana using db query.
- Packet loss rate: % of packets loss or dropped. I am not sure which metrics to use in this. I want this to contain (packets lost/packets sent after acknowledgement). I seem to have three metrics available. lost and unacked/sacked and I am not sure if these will give me the correct value because according to this article unacked should give me the right value(after acknowledgment for that frame of time) but it always return 0.
- Retransmission rate: % of DL layer frames retransmitted (Same problem with this. Not sure which metrics should be used to deduct this).
Here is a sample output of ss -it in my node
vjain@hk-osfebn-1298 ~]$ ss -it
..
..
ESTAB 0 0 10.118.228.4:52388 10.118.223.244:amqp
cubic wscale:7,9 rto:201 rtt:0.132/0.012 ato:40 mss:1448 rcvmss:536 advmss:1448 cwnd:10 ssthresh:9 bytes_acked:1063335432 bytes_received:14283091 segs_out:7802018 segs_in:6550978 send 877.6Mbps lastsnd:33778 lastrcv:28454 lastack:28454 pacing_rate 1748.5Mbps retrans:0/28 rcv_rtt:88378.6 rcv_space:35246
..
..
I can't seem to understand the output of retrans or rtt.
In retrans:0/28 is 0 the number of retransmits and 28 are the total number of packets sent? But I can't see any metric that has the value of 28 in the grafana dashboard using the endpoints given in the ss output. Also there seems to be no output related to lost packets but netlink package derives it from lost_out
in include/linux/tcp.h
https://elixir.bootlin.com/linux/latest/source/include/linux/tcp.h