5

I have some sample source code for OpenGL, I wanted to compile a 64bit version (using Delphi XE2) but there's some ASM code which fails to compile, and I know nothing about ASM. Here's the code below, and I put the two error messages on the lines which fail...

// Copy a pixel from source to dest and Swap the RGB color values
procedure CopySwapPixel(const Source, Destination: Pointer);
asm
  push ebx //[DCC Error]: E2116 Invalid combination of opcode and operands
  mov bl,[eax+0]
  mov bh,[eax+1]
  mov [edx+2],bl
  mov [edx+1],bh
  mov bl,[eax+2]
  mov bh,[eax+3]
  mov [edx+0],bl
  mov [edx+3],bh
  pop ebx //[DCC Error]: E2116 Invalid combination of opcode and operands
end;
PhiS
  • 4,540
  • 25
  • 35
Jerry Dodge
  • 26,858
  • 31
  • 155
  • 327
  • You will need to write a 64bit version of your ASM instructions, and use `{$IFDEF WIN64}` to tell the compiler which set of ASM instructions to use for the given target platform. – LaKraven May 22 '12 at 03:23
  • Thanks but the key is I know nothing about ASM to know how to write it. – Jerry Dodge May 22 '12 at 03:29
  • Found something here: http://docwiki.embarcadero.com/RADStudio/en/Converting_32-bit_Delphi_Applications_to_64-bit_Windows - it says that "asm is not supported in 64bit XE2" – Jerry Dodge May 22 '12 at 05:03
  • 1
    @Jerry Dodge I've added pure Pascal version – MBo May 22 '12 at 05:17
  • 4
    @JerryDodge This is not true at all. 64bit XE2 does not support 32 bit x86 asm block, by definition. But 64bit XE2 supports x64 assembler. You can not write asm blocks within functions, but you can write plain functions or methods in asm. The difficult part is [handling exceptions and the stack properly](http://www.bitcommander.de/blog/index.php/2011/08/29/xe2-win64-osx-jcldebug/). – Arnaud Bouchez May 22 '12 at 05:31
  • @ArnaudBouchez Thanks for clarifying, as mentioned, I don't even know the first bit to know about Assembly. – Jerry Dodge May 22 '12 at 05:32
  • @vhanla Thank you, I'm sure that will be valuable to someone, but it's complete Greek to me :-) – Jerry Dodge Nov 13 '15 at 19:05

2 Answers2

13

This procedure swaps ABGR byte order to ARGB and vice versa.
In 32bit this code should do all the job:

mov ecx, [eax]  //ABGR from src
bswap ecx       //RGBA  
ror ecx, 8      //ARGB 
mov [edx], ecx  //to dest

The correct code for X64 is

mov ecx, [rcx]  //ABGR from src
bswap ecx       //RGBA  
ror ecx, 8      //ARGB 
mov [rdx], ecx  //to dest

Yet another option - make pure Pascal version, which changes order of bytes in array representation: 0123 to 2103 (swap 0th and 2th bytes).

procedure Swp(const Source, Destination: Pointer);
var
  s, d: PByteArray;
begin
  s := PByteArray(Source);
  d := PByteArray(Destination);
  d[0] := s[2];
  d[1] := s[1];
  d[2] := s[0];
  d[3] := s[3];
end;
Johan
  • 74,508
  • 24
  • 191
  • 319
MBo
  • 77,366
  • 5
  • 53
  • 86
  • 1
    The asm version won't work in 64 bit, as you stated. The best option is to use pascal. For better performance: make this procedure `inline` and do not use temporary variables, but directly change the signature to `Source, Destination: PByteArray`. +1 in all cases for the much better x86 asm coding than the awful original asm code (slower than pascal). If my train was not late this morning, I'd have put a similar version (using eax instead of ecx, may be a bit faster). In all cases, best performance will be by unrolling the loop and use SSE2 instructions. – Arnaud Bouchez May 22 '12 at 05:27
  • PS - the 4 ASM lines above do in fact work, but I went with the pascal version anyway for ease of personal readability and understanding :D – Jerry Dodge May 22 '12 at 05:42
  • @Arnaud Bouchez BDS2006 compiler doesn't use real temporary variables and makes all the job in registers. But you are right in general. And idea about SSE could be useful, because this transformation is typical for bulk data treatment. – MBo May 22 '12 at 05:42
  • @JerryDodge Asm code works, but does it produce right result? (I cannot check 64bit) – MBo May 22 '12 at 05:45
  • It appears to, unless the actual implementation of it doesn't even represent it... Honestly I have no clue what it's really doing, that's what I'm trying to figure out, but the image appears fine. – Jerry Dodge May 22 '12 at 06:18
  • 2
    That asm code can't be right in 64 bits because it truncates the pointers to 32 bits. And it reads from the wrong registers. Just because it works in a simple test does not mean it is correct for all input. – David Heffernan May 22 '12 at 07:02
  • 1
    procedure CopySwapPixel(const Source, Destination: Pointer); asm mov ecx, [Source] //ABGR from src bswap ecx //RGBA ror ecx, 8 //ARGB mov [Destination], ecx //to dest end; – pani May 22 '12 at 12:00
  • WARNING the asm code proposed by pani is not right, on x64. You'll need a `.noframe` pseudo op first. With no best speed. A pure pascal + `inline` would be faster than this! See [this link about x64 asm in Delphi XE2](http://blogs.embarcadero.com/abauer/2011/10/10/38940). – Arnaud Bouchez May 22 '12 at 15:18
  • @Arnaud Bouchez OK, removed. I'd better stop modifications without the possibility of testing on x64 system ;) – MBo May 22 '12 at 15:43
3

64 bit has different names for pointer registers and it is passed difference. The first four parameters to inline assembler functions are passed via RCX, RDX, R8, and R9 respectively

EBX -> RBX
EAX -> RAX
EDX -> RDX

try this

procedure CopySwapPixel(const Source, Destination: Pointer);
{$IFDEF CPUX64}
asm
  mov al,[rcx+0]
  mov ah,[rcx+1]
  mov [rdx+2],al
  mov [rdx+1],ah
  mov al,[rcx+2]
  mov ah,[rcx+3]
  mov [rdx+0],al
  mov [rdx+3],ah
end;
{$ELSE}
asm
  push ebx //[DCC Error]: E2116 Invalid combination of opcode and operands
  mov bl,[eax+0]
  mov bh,[eax+1]
  mov [edx+2],bl
  mov [edx+1],bh
  mov bl,[eax+2]
  mov bh,[eax+3]
  mov [edx+0],bl
  mov [edx+3],bh
  pop ebx //[DCC Error]: E2116 Invalid combination of opcode and operands
end;
{$ENDIF}
APZ28
  • 997
  • 5
  • 4
  • I suspect the x64 compiler should not like that asm code. You'll need to specify that this asm procedure has no stack frame needed (a `.noframe` pseudo compiler instruction is needed at the beginning of the `asm...end` block). And it won't be faster than pure pascal. So IMHO the pure pascal version is to be recommended. It will also be ARM ready, for your next iPhone (or Android?) application. ;) – Arnaud Bouchez May 22 '12 at 15:16
  • 1
    @arnoud bouchez: if you use only asm..end (so without begin) the stackframe is already omitted. And since arm is big endian there's probably no need to swap at all ;-) – Remko May 22 '12 at 17:05
  • 1
    .noframe is just a way to help compiler skip generate stack instructions for passing parameters; it has nothing to do with compile or not compile. For 64 bit, you can not push a 32 bits register to stack same as 32 bits, you can not push 16 register (AX, DX...) to stack. If he change to push RBX, it will compile under 64 bit compiler but those asm codes are not correct – APZ28 May 22 '12 at 20:40
  • @Remko 1. The stackframe is not ommited, as far as [this reference article tells](http://blogs.embarcadero.com/abauer/2011/10/10/38940). 2. ARM assembler is completely diverse: this Intel/AMD code won't even compile. 3. Code is working at byte level so here endianness does not impact anything. 4. And this is not an endianess swap here, but a RGBA pixel colors swap. – Arnaud Bouchez May 23 '12 at 05:52
  • @APZ28 You are right about .noframe. See [Allen Bauer article](http://blogs.embarcadero.com/abauer/2011/10/10/38940). In all cases, using asm in such a sub function is a non sense here: it adds complexity, and will be slower than an `inline`d pure pascal version. In this context, ASM does make sense only if you explicitly use SSE2 instructions within an unrolled loop. Writing asm code less efficient that the one generated by the compiler does not makes sense to me. – Arnaud Bouchez May 23 '12 at 05:55