1

This is my function prototype:

char* swap(char* array, int index1, int index2);

This is my assembly code:

segment .text
   global swap

swap:
   mov r14,[rdi+rsi]
   mov r15,[rdi+rdx]
   
   mov [rdi+rsi],r15        ;this line segfaults
   mov [rdi+rdx],r14
   
   mov rax,rdi
   
   ret

The lines mov [rdi+rsi],r15 and mov [rdi+rdx],r14 give me a segfault; I'm not sure where I'm going wrong

The calling function:

 #include <stdio.h>
 #include <stdlib.h>

 extern char* swapLetters(char* str, int indexA, int indexB);

 int main()
 {    
    char* st= "E X A M P L E";
    printf("Before swap: \t%s\n", st);

    char * res = swap(st, 2 ,10);

    printf("After swap: \t%s\n", res);
    return 0;
}

Expected output:

Before swap: E X A M P L E

After swap: E L A M P X E

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Darthvader
  • 35
  • 3
  • Are both indexes inside your `array`? – Vlad Feinstein Oct 08 '20 at 18:44
  • I thought that mov r14[rdi+rsi] would move the char at index rsi in the array, I'm not too sure about how moving 1 char. Thank you so much. – Darthvader Oct 08 '20 at 18:48
  • I have added the C code in the edited question – Darthvader Oct 08 '20 at 18:49
  • 1
    Why not write the c code and compile it and look at the generated code? There are online sites that let you do that – Support Ukraine Oct 08 '20 at 18:50
  • Thank you so much for the help. Really appreciate it – Darthvader Oct 08 '20 at 19:15
  • 1
    @Darthvader what is the point of useng assembler? C compiler generates it wihout any problems https://godbolt.org/z/G7eY9Y – 0___________ Oct 08 '20 at 19:38
  • Please show the C implementation that you have that results in that assembly code. And, fwiw, changing all of the characters in an array of characters (as indicated by your example result) requires some kind of iterative algorithm, i.e. a `while(buf){...}` or a `for(i=0;i – ryyker Oct 08 '20 at 21:04
  • @4386427: The best example being https://godbolt.org/ which is pretty easy to type. Giving beginners an actual URL is more likely to get them to actually use it. – Peter Cordes Oct 08 '20 at 22:01
  • @Darth: Note that your C example has a prototype for `extern char* swapLetters`, not the function you actually called, `swap`. C *does* allow calling unprototyped functions, but it's discouraged and should give a warning. If you can change the prototype to taking `uint64_t` or `size_t` args, that would be better; then your asm wouldn't have to sign-extend the 32-bit index args before using (which you didn't do) to correctly follow the calling convention. (See [my comment](https://stackoverflow.com/questions/64268543/fu/64269363?noredirect=1#comment113652024_64269608)) – Peter Cordes Oct 08 '20 at 22:20
  • @DarthVader : there are now a couple of answers that try to answer your question. You can accept one of them as an answer to let others know in the future this question has a resolution. More on the how and why of accepting an answer on SO can be found here: https://meta.stackexchange.com/a/5235/271768 . Upvoting and accepting an answer are different things. – Michael Petch Oct 21 '20 at 00:48

2 Answers2

3

The primary problem is that your st variable is defined as a pointer to a string literal.

char* st= "E X A M P L E";

String literals in the C language are considered read-only. To modify such a string is undefined behaviour. What happens is unknown and will be specific to the compiler and the environment it runs in. Your environment is raising an exception when you go to write that memory in the assembly code. On most modern OSes using modern compilers the string literals are placed in memory that isn't writeable so that it will generate an exception, which is what happened in your case.

If you wish to create a character array in writeable memory you can define st this way:

char st[] = "E X A M P L E";

Issues with the Assembly Code

One issue is that your indices to the function swap are int. In 64-bit GCC/CLANG int is 32-bits. If you pass 32-bit signed int to the assembly code the top 32-bits may have garbage in them. Given that your indices are never negative you should use an unsigned type and preferably one that is 64-bit. I would recommend the size_t type instead. size_t will be unsigned and 64-bit in size in x86-64 code, so when passed to the assembly code you don't need to sign/zero extend the index values to 64-bits before using them. I'm recommending changing swap to be:

char* swap(char* array, size_t index1, size_t index2)

If you keep index1 and index2 as signed integers (int) the beginning of your assembly code would have to use MOVSX on both ESI and EDX registers. That code would look like:

swap:
    movsx rsi, esi        ; Sign extend 32-bit index1 parm in ESI to 64-bits
    movsx rdx, edx        ; Sign extend 32-bit index2 parm in EDX to 64-bits
    ; rest of function here

If you were to have used 32-bit unsigned int for index and index2 you would have had to zero extend the 32-bit values with:

    mov esi, esi          ; Zero extend 32-bit index1 parm in ESI to 64-bits
    mov edx, edx          ; Zero extend 32-bit index2 parm in EDX to 64-bits
    ; rest of function here

When the destination of an operation is a 32-bit register in 64-bit mode, the CPU automatically zeros the upper 32-bits of the destination register. Moving a 32-bit register like ESI to itself will clear the upper 32-bits of RSI. This is the same for all the general purpose registers.


RBX, RBP, and R12–R15 are non-volatile registers according to the x86-64 System V ABI. If your function modifies them their contents have to be preserved. You can push them on the stack and pop their original values off the stack when finished. The preferred way is to use one of the volatile registers that don't need to preserved like R8-R11, RAX, RCX, RDX, RDI, RSI.


When you move data to/from memory using a 64-bit register then 64 bits (8 bytes) will be transferred. As an example:

mov r14,[rdi+rsi]

Moves the 8 bytes starting at memory address [rdi+rsi] and moves it to 64-bit register R14. The write later on does something similar but updates 8 bytes in memory rather than one byte. Updating 8 bytes of data could smash the stack if the array of characters were placed on the stack, which happens to be the case in your code and environment.

When using the numbered registers R8 to R15 you can reference the low 8 bits by placing a b suffix on the end of the register name (w is for 16-bit word, d is for 32-bit double word). A complete chart of all the registers names in NASM/YASM syntax for 64-bit mode are:

enter image description here

mov r14,[rdi+rsi] would be written as mov mov r14b,[rdi+rsi] to move a single byte. You would have to make that change to each of the other moves as well.


Assuming you change index1 and index2 to have type size_t (or uin64_t) your assembly code could have been written as :

segment .text
global swap

swap:
   push r14           ; Save non-volatile registers we overwrite
   push r15

   mov r14b,[rdi+rsi] ; Move one byte from [rdi+rsi] to R14B. R14B is lower 8 bits of R14
   mov r15b,[rdi+rdx] ; Move one byte from [rdi+rdx] to R15B. R15B is lower 8 bits of R15
   mov [rdi+rsi],r15b ; Move the byte in R15B to [rdi+rsi]
   mov [rdi+rdx],r14b ; Move the byte in R14B to [rdi+rdx]
   mov rax,rdi

   pop r15            ; Restore non-volatile registers
   pop r14
   ret

If you were to use the other volatile registers rather than the non-volatile ones the code could have been simplified to:

segment .text
global swap

swap:    
   mov al,[rdi+rsi]   ; Move one byte from [rdi+rsi] to AL. AL is lower 8 bits of RAX
   mov cl,[rdi+rdx]   ; Move one byte from [rdi+rdx] to CL. CL is lower 8 bits of RCX
   mov [rdi+rsi],cl   ; Move the byte in CL to [rdi+rsi]
   mov [rdi+rdx],al   ; Move the byte in AL to [rdi+rdx]
   mov rax,rdi

   ret

In this case we use the lower 8 bits of the volatile registers RAX(AL) and RCX(CL) to do the swap. Since we don't have to preserve these registers there is no need to save and restore them.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • 2
    Final problem with the asm: the indices are only declared `int`; a caller is allowed to leave garbage in the high 32 bits, and you can get GCC to actually emit such a caller by passing along the low 32 bits of a wider integer variable where the full 64 bits gets used for something before the call. Best to declare them `size_t` or `uint64_t`, otherwise in the asm you need to `movsxd` sign-extend both inputs. (You could zero-extend both of them with `xchg esi, edx`, good code size, bad efficiency. Compilers would of course use 2x `mov`, probably `mov esi,esi` defeating mov-elimination.) – Peter Cordes Oct 08 '20 at 22:07
  • 2
    Just noticed that the C block declares a prototype for `swapLetters`, not `swap`, so they are actually calling it unprototyped unless that block at the top of the question is also getting included somehow. (int vs. size_t bugs like this are a great example of why testing can be insufficient; with constant integer args, the easiest way for the caller to pass them does zero-extend. So I guess a test case taking a pointer to the *end* of the string, with negative args, would be good. Or maybe the OP has a Java background and isn't aware of `unsigned`, and didn't intend specifically signed.) – Peter Cordes Oct 08 '20 at 22:26
0

Part of the problem here is that an area of non-writable memory is being used to write to, it will not work. (There are also other correctness problems with the asm, see @MichaelPetch's answer.)

When this is created:

char* st= "E X A M P L E";

Because it creates a string literal, the pointer st refers to a memory location that is not writable.

If created as:

char st[] = "E X A M P L E";

st stored in writable memory and its contents are the characters, instead of just holding a pointer to a read-only string literal.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
ryyker
  • 22,849
  • 3
  • 43
  • 87
  • 2
    I happen to be your upvote. It is possible someone might have downvoted as it was only one of the problems in the code. I'm unsure. At least from the _C_ side your answer seems fine. – Michael Petch Oct 08 '20 at 21:51
  • @MichaelPetch - Thank you. You will get one from me as soon as you get the _accept_. (Which btw I think you should.) Thanks also for dv insight, I do refer to assembly when something is working oddly, (created in Code::Blocks environment usually.) but beyond that, am not very conversant in it. – ryyker Oct 08 '20 at 21:52
  • 1
    I didn't downvote either, but pretty clearly "function that takes a char array with 2 indices" is talking about the function args. Clearer phrasing would have been "function that takes a char array *and* 2 indices". – Peter Cordes Oct 08 '20 at 21:58
  • @PeterCordes - I did not pick that up, but by the context of OP content, I think you are right. Thank you. (Unfortunately I am out of time for today, I will have to put this on hold until I can re-do a few things.) – ryyker Oct 08 '20 at 22:02
  • 1
    No worries, fixed that for you by deleting that last section, and I'll just reword the title. – Peter Cordes Oct 08 '20 at 22:13
  • @PeterCordes - Thank you. – ryyker Oct 09 '20 at 01:34