2

im using FASM and this is my program

format ELF64

section '.text' executable

public func

func:
        vmovaps ymm0, YWORD [.table]
        xor     rax, rax
        ret

        align 32
        .table:
                DQ      1024
                DQ      1024
                DQ      1024
                DQ      1024
                DQ      2048
                DQ      2048
                DQ      2048
                DQ      2048

im using AVX so i created a table (which Must be aligned at 32-Byte Boundary) to initializing the ymm0 register. but when i try to compile this program, i get "section is not aligned enough" error from FASM. ".table" must be aligned at 32-Byte boundary because i am using "movaps" (or movdqa (no matter)). but why FASM gives me an error ? is it wrong to use 'align' like this?

UPDATE is it right to do something like this ? because by doing this, program works without any problem but is it a right way?

section '.text' executable

public func

func:
        vmovaps ymm0, YWORD [.table]
        xor     rax, rax
        ret

        section '.rodata' align 32

        .table:
                DQ      1024
                DQ      1024
                DQ      1024
                DQ      1024
                DQ      2048
                DQ      2048
                DQ      2048
                DQ      2048
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Jason
  • 75
  • 1
  • 6
  • 1
    Note that there's very little benefit to putting data right next to code. x86 CPUs use split I/D L1 caches, and split iTLB / dTLB. (But unified L2 data and L2TLB). Normally it's best to group all your constants together to avoid polluting I-cache with data and vice versa; that's why `section .rodata` is a thing. Also, with AVX / AVX2 you can use a `vpbroadcastq` or `vbroadcastsd` load so you only need one `dq 1024` to fill a register. This lets you "compress" constants for free if you're going to load them into registers anyway, except it doesn't work with `vpaddq ymm0, [mem]` for example – Peter Cordes Sep 29 '19 at 14:18
  • thx man ... good point ... – Jason Sep 29 '19 at 14:25
  • but there is a point .... if we have a very long table and we need to use it in multiple functions, it's the best way to define the table once in rodata section because if we try to put the table in the end of each function, our final file will be toooooo large !!! right ? – Jason Sep 29 '19 at 14:34
  • You don't want to duplicate constants in general, even if they're small. Although I guess it can make sense to repeat a scalar float or double if that means all the constants for a function are in the same cache line. (instead of loading one of the constants from a separate place near the constants for another function that it shares only 1 small constant with.) Anyway, avoiding duplication is a separate issue from grouping all rodata together into one section. You want to do both. – Peter Cordes Sep 29 '19 at 14:45
  • BTW, `xor rax, rax` is inefficient. Use `xor eax,eax` to avoid a REX prefix and still zero the full register. (IDK if FASM does that optimization for you or not). – Peter Cordes Sep 29 '19 at 14:59
  • 1
    @PeterCordes For repeatedly used constants, there is a special ELF feature that merges equal constants in appropriate sections. – fuz Sep 29 '19 at 18:02

1 Answers1

2

With FASM, alignment inside a section can't be greater than the section alignment itself. When you don't specify a section's alignment the default is 8 for ELF64 and 4 for ELF. To change the default section alignment use align like this:

section '.text' executable align 32

This should allow you to use alignment up to 32 within the section. Your code could have looked like this:

section '.text' executable align 32

public func
public func2

func:
        vmovaps ymm0, YWORD [.table]
        xor     rax, rax
        ret

        align 32
        .table:
                DQ      1024
                DQ      1024
                DQ      1024
                DQ      1024
                DQ      2048
                DQ      2048
                DQ      2048
                DQ      2048

func2:
        vmovaps ymm0, YWORD [.table]
        xor     rax, rax
        ret

        align 32
        .table:
                DQ      1024
                DQ      1024
                DQ      1024
                DQ      1024
                DQ      2048
                DQ      2048
                DQ      2048
                DQ      2048

You can put the constant data in the .rodata (read only) separate from code in the .text (code) section. You can have multiple functions using that data. You can place different tables and data and use the align directive inside sections to align specific data that may require it. This code doesn't do anything useful, but is an example:

FORMAT ELF64

section '.text' executable

public func
public func2
public func3
public func4

func:
        vmovaps ymm0, YWORD [table]
        xor     rax, rax
        ret

func2:
        vmovaps ymm0, YWORD [table2]
        mov     eax, MyStr
        ret

func3:
        vmovaps ymm0, YWORD [table]
        xor     rax, rax
        ret

func4:
        vmovaps ymm0, YWORD [table3]
        xor     rax, rax

        ret

section '.rodata' align 32

MyStr: DB 'Hello There', 0

align 32
table:
        DQ      1024
        DQ      1024
        DQ      1024
        DQ      1024
        DQ      2048
        DQ      2048
        DQ      2048
        DQ      2048

align 32
table2:
        DQ      1024
        DQ      1024
        DQ      1024
        DQ      1024
        DQ      2048
        DQ      2048
        DQ      2048
        DQ      2048

table3:
        DQ      1024
        DQ      1024
        DQ      1024
        DQ      1024
        DQ      2048
        DQ      2048
        DQ      2048
        DQ      2048

Note: In this example all the table data is the same, but in a real situation tables would have whatever pertinent values you require.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • @jason: In your update that will work, but let's say you have other data in `.rodata` and then need other alignment you can still use `align` within the section up to the alignment of the section itself. I've updated my answer with another way that is closer to what you originally had. – Michael Petch Sep 29 '19 at 14:11
  • thx ... a question ... if i use align 32 for my text section, did it makes my program larger (size) ? for example i may have over 20 functions and only one them is using 32 byte align .... it's better to use one "section '.text' executable align 32" header for all functions or it's better to use "section '.text' executable align 32" for the function that needs align 32 and use "section '.text' executable" header for other functions (that don't needs align 32) ? – Jason Sep 29 '19 at 14:19
  • for your last update .... i think it's better for each table to be at the end of the function that we want to use it in ... so I think it's a bad idea to listing all tables in the "rodata" and it's better to define the table at the end of the function ... right ? – Jason Sep 29 '19 at 15:42
  • @jason: If you review the updates there are two version I present, one where the tables are in the function and one where the tables are in .rodata`. The idea was to show you how to do it in a separate section if you wanted to. – Michael Petch Sep 29 '19 at 15:46
  • yes and thx ... but the question is that which one ? which one is better to use ? (if we have a large number of functions and a large number of tables ...) – Jason Sep 29 '19 at 15:54
  • @jason if you have a very long table and use it from multiple functions then you'd probably want to have just one copy of it so it may make sense to have it globally accessible in `.rodata`. If each function uses its own unique table you might for readability place it with the code. If you find yourself duplicating the same table repeatedly then tits probably not the right way if you are concerned about memory space (which you seemed to be concerned about in an earlier comment) – Michael Petch Sep 29 '19 at 16:01