3

I am trying to profile an x86 Assembly program using Ubuntu 12.04. I'd like to use the rdtsc function. The problem is, according to a comment, that I should get the number of cycles in rdx but with the following code I get a too high number:

SECTION .bss

SECTION .dat

SECTION .text


global main         

main:           
nop

cpuid
rdtsc
shl rdx, 32
or rdx, rax
mov r8, rdx

xor esi,esi
mov esi,19        ; instructions to be monitored


cpuid
rdtsc
shl rdx, 32
or rdx, rax
sub rdx, r8

Running it in a debugger I get the following results on registers after the sub instruction:

rax     0xd88102bc
rbx     0x0
rcx     0xf0
rdx     0x44f3914a0
rsi     0x13
rdi     0x1
rbp     0x0
rsp     0x7fffffffdf38
r8      0x11828947ee1c

I can't figure out why the number of cycles in rdx is so high for so simple instructions. Is the right number in rcx? Isn't it too high too?

Thanks in advance

Luigi Tiburzi
  • 4,265
  • 7
  • 32
  • 43

1 Answers1

8

I'm not sure what's happening, but when you're calling C functions from assembler you should usually prefix them with a leading underscore, for example call _clock. This is because the C compiler will add this prefix to all functions it generates.

Additionally as you're on a 64-bit architecture the 64-bit result should end up in rax, you should ensure you're looking at that, not eax and ebx.

Finally I'd suggest rather than using clock you should use the assembler instruction rdtsc. This will return a 64-bit result in edx:eax. It's relative rather than absolute and is measured in cycles rather than some fractions of seconds, but it should be exactly what you need for profiling.

Example:

cpuid
rdtsc
shl rdx, 32
or rdx, rax
mov r8, rdx
<expensive assembler code>
cpuid
rdtsc
shl rdx, 32
or rdx, rax
sub rdx, r8

This will leave the number of ticks that elapsed in rdx. The cpuid instructions are to prevent the processor from reordering instructions around the profiling points.

jleahy
  • 16,149
  • 6
  • 47
  • 66
  • I've tried the way you suggest but I had to put some modifications due to opcode operand memory errors. First I tried to make the add instructions add edx,eax and then add rdx,rax because having the operand of different sizes gives errors. I got in rdx respectively xa5e996bc and 2c4b89fe072. Aren't they too high values if I put your instructions between a simple xor and assignment ones? Which is the correct one to be used? I'd say the one with eax,edx but the number seems to me still too high – Luigi Tiburzi Sep 30 '12 at 21:12
  • You're right about the mistake in the assembler code. I've corrected it so that it clears `rax` first, then adds the whole of `rax`, not just `eax`. You can't just do `add eax, edx` because the carry won't propagate correctly. If you're doing this correctly then the result in `rdx` should be a relatively small number. Make sure you're not clobbering `rcx` in your code in the middle. – jleahy Oct 01 '12 at 08:46
  • Excuse me, probably I'm not getting something right. I did exactly as you wrote and I got in rdx a number of the order 1000 000 000 000 000. – Luigi Tiburzi Oct 01 '12 at 13:52
  • ...As I noticed that rcx got clobbered by the second cpuid instruction I tried to push it just before the second cpuid and to pop it just before the sub instruction to make the subtraction with the old value. In this case I got a number of the order 22 000 000 000 which, if I have understood right what I'm doing, means on a 2.86 GHz cpu that the whole process should take more or less 10 seconds. In fact my process take the time of an eye-blinking. Sorry again but I'm rather puzzled by this behavior, I'm not sure I'm doing right... – Luigi Tiburzi Oct 01 '12 at 13:52
  • I checked the Intel manual and found in 64-bit mode rdtsc clears the upper parts of `rax` and `rdx`, so updated the code again. I also used `r8` to prevent `cpuid` clobbering it. Why it still isn't working is a bit of a mystery to me. That number is just a bit over 32-bits long, it's almost like we have the high and low halves the wrong way round. Was the result a multiple of 2^32? – jleahy Oct 01 '12 at 17:40
  • I'm awfully sorry to put yourself into troubles... this is still not working. The number I get, even if it changes slightly, is 24b2a4d30 which is not a multiple of 2^32 and is, to me, too high. I really don't understand why. This is my "expensive" assembly code: xor esi,esi -- mov esi,19 -- mov dword [variable],0. Is it possible that something in the previous instructions make something "damaging"? Thanks again – Luigi Tiburzi Oct 02 '12 at 07:27
  • Your code looks fine. That number looks suspiciously like `(2<<32)|(32-bits of garbage)`. I don't know why this would be. Have you tried single stepping the code with a debugger? – jleahy Oct 02 '12 at 13:52
  • I'm sure the problem is me being an absolute assembly beginner... Maybe there is something in the code I'm doing in the wrong way and that garbage is the result. I follow the steps with gdb in Emacs and all seems normal, or at least it seems normal to the eyes of a beginner :-). There is something I should check in particular? – Luigi Tiburzi Oct 02 '12 at 17:35
  • 1
    I'm not sure. I'd suggest you post a minimal yet complete example as a new question. That'll make it easier for people to look at. If it turns out there's a problem with the code in this answer I'll edit it or you can. – jleahy Oct 02 '12 at 21:07