The comments to the question already cover the retirement rate, which is the throughput at which instructions can retire once they are the oldest un-retired instructions. This seems to be at least 4 instructions per cycle per thread for recent Intel (Skylake) and 8 instructions per core on AMD (Ryzen).
This rate is at least as wide as other bottlenecks such as renaming (4 on recent Intel, 5 or 6 on recent AMD), so that it is rarely a bottleneck and is hard to measure directly since most tests will bottleneck on something else before you reach the maximum retirement rate.
It seems like that might not be your question though since you wrote:
how long it takes to retire an instruction after it has left its
execution port assuming no delays
It isn't clear what you mean by "no delays" but that's a totally different question - how long that takes depends on how many instructions are in front of it waiting to retire and how long they take to retire. I suppose in the worse case, the oldest instruction is stalled (e.g., a long latency miss to DRAM), and then retirement of any younger instructions could take 100 ns or more. Maybe that violates your "no delays" rule though? In the general case, an instruction has to wait for all earlier instructions to retire, which may be many cycles even when things are flowing smoothly.