there are many papers mentioning that the RDMA SEND/RECV is often used for command/signal or small-size data communication. But I can't find a reasonable explanation for that.
I know that the memory semantic (WRITE/READ) can reduce latency and CPU workload further (compared with SEND/RECV), and it seems that the WRITE/READ operations can simply dominate in all cases?