0

Background:

I am new to assembly. When I was learning programming, I made a program that implements multiplication tables up to 1000 * 1000. The tables are formatted so that each answer is on the line factor1 << 10 | factor2 (I know, I know, it's isn't pretty). These tables are then loaded into an array: int* tables. Empty lines are filled with 0. Here is a link to the file for the tables (7.3 MB). I know using assembly won't speed up this by much, but I just wanted to do it for fun (and a bit of practice).

Question:

I'm trying to convert this code into inline assembly (tables is a global):

int answer;
// ...
answer = tables [factor1 << 10 | factor2];

This is what I came up with:

asm volatile ( "shll $10, %1;"
           "orl %1, %2;"
           "movl _tables(,%2,4), %0;" : "=r" (answer) : "r" (factor1), "r" (factor2) );

My C++ code works fine, but my assembly fails. What is wrong with my assembly (especially the movl _tables(,%2,4), %0; part), compared to my C++

What I have done to solve it:

I used some random numbers: 89 796 as factor1 and factor2. I know that there is an element at 89 << 10 | 786 (which is 91922) – verified this with C++. When I run it with gdb, I get a SIGSEGV:

Program received signal SIGSEGV, Segmentation fault.

at this line:

"movl _tables(,%2,4), %0;" : "=r" (answer) : "r" (factor1), "r" (factor2) );

I added two methods around my asm, which is how I know where the asm block is in the disassembly.

Disassembly of my asm block:

The disassembly from objdump -M att -d looks fine (although I'm not sure, I'm new to assembly, as I said):

402096: 8b 45 08                mov    0x8(%ebp),%eax
402099: 8b 55 0c                mov    0xc(%ebp),%edx
40209c: c1 e0 0a                shl    $0xa,%eax
40209f: 09 c2                   or     %eax,%edx
4020a1: 8b 04 95 18 e0 47 00    mov    0x47e018(,%edx,4),%eax
4020a8: 89 45 ec                mov    %eax,-0x14(%ebp)

The disassembly from objdump -M intel -d:

402096: 8b 45 08                mov    eax,DWORD PTR [ebp+0x8]
402099: 8b 55 0c                mov    edx,DWORD PTR [ebp+0xc]
40209c: c1 e0 0a                shl    eax,0xa
40209f: 09 c2                   or     edx,eax
4020a1: 8b 04 95 18 e0 47 00    mov    eax,DWORD PTR [edx*4+0x47e018]
4020a8: 89 45 ec                mov    DWORD PTR [ebp-0x14],eax

From what I understand, it's moving the first parameter of my void calc ( int factor1, int factor2 ) function into eax. Then it's moving the second parameter into edx. Then it shifts eax to the left by 10 and ors it with edx. A 32-bit integer is 4 bytes, so [edx*4+base_address]. Move result to eax and then put eax into int answer (which, I'm guessing is on -0x14 of the stack). I don't really see much of a problem.

Disassembly of the compiler's .exe:

When I replace the asm block with plain C++ (answer = tables [factor1 << 10 | factor2];) and disassemble it this is what I get in Intel syntax:

402096: a1 18 e0 47 00          mov    eax,ds:0x47e018
40209b: 8b 55 08                mov    edx,DWORD PTR [ebp+0x8]
40209e: c1 e2 0a                shl    edx,0xa
4020a1: 0b 55 0c                or     edx,DWORD PTR [ebp+0xc]
4020a4: c1 e2 02                shl    edx,0x2
4020a7: 01 d0                   add    eax,edx
4020a9: 8b 00                   mov    eax,DWORD PTR [eax]
4020ab: 89 45 ec                mov    DWORD PTR [ebp-0x14],eax

AT&T syntax:

402096: a1 18 e0 47 00          mov    0x47e018,%eax
40209b: 8b 55 08                mov    0x8(%ebp),%edx
40209e: c1 e2 0a                shl    $0xa,%edx
4020a1: 0b 55 0c                or     0xc(%ebp),%edx
4020a4: c1 e2 02                shl    $0x2,%edx
4020a7: 01 d0                   add    %edx,%eax
4020a9: 8b 00                   mov    (%eax),%eax
4020ab: 89 45 ec                mov    %eax,-0x14(%ebp)

I am not really familiar with the Intel syntax, so I am just going to try and understand the AT&T syntax:

It first moves the base address of the tables array into %eax. Then, is moves the first parameter into %edx. It shifts %edx to the left by 10 then ors it with the second parameter. Then, by shifting %edx to the left by two, it actually multiplies %edx by 4. Then, it adds that to %eax (the base address of the array). So, basically it just did this: [edx*4+0x47e018] (Intel syntax) or 0x47e018(,%edx,4) AT&T. It moves the value of the element it got into %eax and puts it into int answer. This method is more "expanded", but it does the same thing as my hand-written assembly! So why is mine giving a SIGSEGV while the compiler's working fine?

Community
  • 1
  • 1
  • 1
    Why are you trying to do a compilers job? – Tony The Lion Mar 13 '13 at 14:39
  • 3
    @TonyTheLion just read the question: _"I know using assembly won't speed up this by much, but I just wanted to do it for fun (and a bit of practice)."_ – stefan Mar 13 '13 at 14:40
  • 1
    "someone is trying to be a compiler" sounded way funnier, @Tony. – Bartek Banachewicz Mar 13 '13 at 14:40
  • The `volatile` part of `asm volatile` tells the compiler that it cannot move your piece of assembly at all, and it cannot move other parts of the code around your assembly either. This prevents the compilers built-in optimizer from doing its job. You should not specify your assembly as `volatile` unless you are sure you need to do so and you understand why. A good example of something that needs it is `rdtsc`, because you don't want the compiler moving that code or it will time the wrong thing. – SoapBox Mar 13 '13 at 14:53
  • @SoapBox Thanks for pointing that out, I'll remove it. –  Mar 13 '13 at 14:59

2 Answers2

2

I bet (from the disassembly) that tables is a pointer to an array, not the array itself.

So you need:

 asm volatile ( "shll $10, %1;"
        movl  _tables,%%eax
       "orl %1, %2;"
       "movl (%%eax,%2,4)",
       : "=r" (answer) : "r" (factor1), "r" (factor2) : "eax" )   

(Don't forget the extra clobber in the last line).

There are of course variations, this may be more efficient if the code is in a loop:

 asm volatile ( "shll $10, %1;"
       "orl %1, %2;"
       "movl (%3,%2,4)",
       : "=r" (answer) : "r" (factor1), "r" (factor2), "r"(tables) )   
Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • Thank you very much! There was a comma at the end of the `movl` that was confusing the compiler. Other than that that, it worked very well! –  Mar 13 '13 at 14:58
2

This is intended to be an addition to Mats Petersson's answer - I wrote it simply because it wasn't immediately clear to me why OP's analysis of the disassembly (that his assembly and the compiler-generated one were equivalent) was incorrect.

As Mats Petersson explains, the problem is that tables is actually a pointer to an array, so to access an element, you have to dereference twice. Now to me, it wasn't immediately clear where this happens in the compiler-generated code. The culprit is this innocent-looking line:

a1 18 e0 47 00          mov    0x47e018,%eax

To the untrained eye (that includes mine), this might look like the value 0x47e018 is moved to eax, but it's actually not. The Intel-syntax representation of the same opcodes gives us a clue:

a1 18 e0 47 00          mov    eax,ds:0x47e018

Ah - ds: - so it's not actually a value, but an address!

For anyone who is wondering now, the following would be the opcodes and ATT syntax assembly for moving the value 0x47e018 to eax:

b8 18 e0 47 00          mov    $0x47e018,%eax
us2012
  • 16,083
  • 3
  • 46
  • 62