A method which is hacky and time consuming but generic (even for softwares not providing (good) scripting), could be to record your screen at high speed (such as 60 fps). Then you look at the frames to count the frames between giving the order (click, enter key) and the result (updated display).
The precision will be in the order of 1 / recording frequency (16 ms if recording at 60 fps). A drawback is that you are likely to measure the time of more than just the operation you are interested in, for instance if you want to bench the loading of a file, you will also measure the time it took to render it after (which may/should be negligible).
I was able to apply this method using https://github.com/SerpentAI/D3DShot (increase the framebuffer size which by default last 1 second). Note that the frame numbers when exporting to files are going backward in time.
It may be possible to make this method less hacky by using computer vision algorithms to not have to count frames manually.