0

I am writing a compiler in attempt to switch my programming language from interpreted to compiled

this is the code my script generated:

section .bss
  digitSpace resb 100
  digitSpacePos resb 8
    string_at_index_0 resb 12
    string_at_index_0_len resb 4

section .data

section .text
    global _start
_start:
   mov rax, "Hello world"
   mov [string_at_index_0], rax
   mov byte [string_at_index_0_len], 13

   mov rax, 1
   mov rdi, 1
   mov rsi, string_at_index_0
   mov rdx, string_at_index_0_len
   syscall

   mov rax, 60
   mov rdi, 0
   syscall

when i run this code with nasm -f elf64 -o test.o test.asm i get this warning:

warning:character constant too long [-w+other]

can anyone help me with this , and also if anyone could suggest a better way to output a Hello world that would be helpful too!

liveno
  • 45
  • 3
  • 1
    `mov rax, "Hello world"` probably isn't valid. Declare the string literal in the .bss segment and load its address instead. – 500 - Internal Server Error Oct 15 '22 at 16:20
  • @500-InternalServerError: It would be valid if it was 8 bytes or fewer; multi-byte character constants in NASM work the same as integer literal. e.g. `"ab"` is `(0x61<<8) | 0x62`. As is, it's just truncated with a warning. If registers were 12 bytes wide, this code would work, since it is storing the string bytes to memory and passing a pointer to it to a `write` syscall. Would also work to `push rax` / `mov rsi,rsp`. Of course it's much simpler to just put string literals in static storage in the first place, not store them from immediates. – Peter Cordes Oct 17 '22 at 05:36

1 Answers1

4
mov rax, "Hello world"

RAX is an 64-bit (8 byte) register, you are trying to put 11 bytes into it.

If you want to store immediate data to memory, you can mov rax, imm64 to put 8 bytes into RAX and then push it or store it. Or you can push "hi!" as a 32-bit immediate if you want.

Here is a simple hello world:

As can be seen you don't want to put the message inside the register, you want to put a pointer to the message into rsi. Since the message is constant, you might as well start with it in a data section instead of an immediate, so you don't have to run instructions at run-time to store it.

section .data                 ; or .rodata

msg: db "Hello World", 10     ; including a `\n` newline
.len equ $ - msg              ; assemble-time constant

; equivalent to
; msg.len equ 12        ; because the distance between here and the start of msg is 12 bytes.

section .text
    global _start
_start:

   mov rax, 1       ; write call number, __NR_write from asm/unistd_64.h
   mov edi, 1       ; to stdout
   mov rsi, msg     ; pointer to message
   mov rdx, msg.len ; length of the message that we defined earlier
   syscall          ; write(1, "Hello World\n", 12)

   mov  eax, 60         ; __NR_exit
   xor  edi, edi
   syscall              ; _exit(0)

Ideally, your compiler should place string literals in the .rodata section (read-only data) and pass pointers to them when using them in functions.

See also How to load address of function or label into register - mov rsi, msg is the least efficient way to do that, despite being the most "obviously" and simple.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847