2

I have the following function (which I cleaned up a bit to make it easier to understand) which takes the destination array gets the element at index n adds to it the src1[i] and then multiplies it with src2[i] (nothing too fancy):

static void F(long[] dst, long[] src1, long[] src2, ulong n) 
{
    dst[n] += src1[n];
    dst[n] *= src2[n];
}

no this generates following ASM:

<Program>$.<<Main>$>g__F|0_0(Int64[], Int64[], Int64[], UInt64)
    L0000: sub rsp, 0x28
    L0004: test r9, r9
    L0007: jl short L0051
    L0009: mov rax, r9
    L000c: mov r9d, [rcx+8]
    L0010: movsxd r9, r9d
    L0013: cmp rax, r9
    L0016: jae short L0057
    L0018: lea rcx, [rcx+rax*8+0x10]
    L001d: mov r9, rcx
    L0020: mov r10, [r9]
    L0023: mov r11d, [rdx+8]
    L0027: movsxd r11, r11d
    L002a: cmp rax, r11
    L002d: jae short L0057
    L002f: add r10, [rdx+rax*8+0x10]
    L0034: mov [r9], r10
    L0037: mov edx, [r8+8]
    L003b: movsxd rdx, edx
    L003e: cmp rax, rdx
    L0041: jae short L0057
    L0043: imul r10, [r8+rax*8+0x10]
    L0049: mov [rcx], r10
    L004c: add rsp, 0x28
    L0050: ret
    L0051: call 0x00007ffc9dadb710
    L0056: int3
    L0057: call 0x00007ffc9dadbc70
    L005c: int3

as you can it adds bunch of stuff and because I can guarantee that the n will be in between the legal range: I can use pointers.

static unsafe void G(long* dst, long* src1, long* src2, ulong n) 
{
    dst[n] += src1[n];
    dst[n] *= src2[n];
}

Now this generates much simpler ASM:

<Program>$.<<Main>$>g__G|0_1(Int64*, Int64*, Int64*, UInt64)
    L0000: lea rax, [rcx+r9*8]
    L0004: mov rcx, rax
    L0007: mov rdx, [rdx+r9*8]
    L000b: add [rcx], rdx
    L000e: mov rdx, [rax]               ; loads the value again?
    L0011: imul rdx, [r8+r9*8]
    L0016: mov [rax], rdx
    L0019: ret

As you may have noticed, there is an extra MOV there (I think, at least I can't reason why is it there).

Question

  • How can I remove that line? In C I could use the keyword restrict if I'm not wrong. Is there such keyword in C#? I couldn't find anything on internet sadly.

Note

  • Here is SharpLab link.
  • Here is the C example:
void
f(int64_t  *dst, 
  int64_t  *src1, 
  int64_t  *src2, 
  uint64_t  n) {
        dst[n] += src1[n];
        dst[n] *= src2[n];
}

void
g(int64_t *restrict dst, 
  int64_t *restrict src1,  
  int64_t *restrict src2, 
  uint64_t          n) {
        dst[n] += src1[n];
        dst[n] *= src2[n];
}

this generates:

f:
        mov     r10, rdx
        lea     rdx, [rcx+r9*8]
        mov     rax, QWORD PTR [rdx]
        add     rax, QWORD PTR [r10+r9*8]
        mov     QWORD PTR [rdx], rax       ; this is strange. It loads the value back to [RDX]?
                                           ; shouldn't that be other way around? I don't know.
        imul    rax, QWORD PTR [r8+r9*8]
        mov     QWORD PTR [rdx], rax
        ret

g:
        mov     r10, rdx
        lea     rdx, [rcx+r9*8]
        mov     rax, QWORD PTR [rdx]
        add     rax, QWORD PTR [r10+r9*8]
        imul    rax, QWORD PTR [r8+r9*8]
        mov     QWORD PTR [rdx], rax
        ret

and here is the Godbolt link.

  • `mov [rdx], rax` **stores** (not loads) the value back to `dst[n]`, *in case* that's the same memory location as `src2[n]` (which `imul rax, [r8+r9*8]` reads). When we know it can't be the same location (because the source promised that with `restrict`), the store can be omitted. No idea what "other way around" you mean, but other than comparing addresses and branching, I don't think there's another reasonable way to resolve the possible aliasing. – Peter Cordes Apr 24 '21 at 10:02
  • @PeterCordes by other way around I meant: `MOV RAX, [RDX]`. Because it is working with `RAX` I thought that it should load the value in `RAX` again. Probably I'm wrong here. –  Apr 24 '21 at 10:09
  • 2
    Note the difference between your C# and C compiler outputs. Your C# output is using a memory *destination* add, so it foolishly has to reload from a location it *knows* it just wrote (`dst[n]`), as well as `src2[n]`. That's dumb, it knows what value is there (because I think the C# thread model allows it to assume no other thread has written that location). (Note that the C# asm copies the LEA result from RAX to RCX for no reason at all, so `[rax]` and `[rcx]` are literally just the same pointer dereferenced twice. Without restrict in C, you just get an extra store and same loads. – Peter Cordes Apr 24 '21 at 12:09

1 Answers1

2

This:

dst[n] = (dst[n] + src1[n]) * src2[n];

removes that extra mov.

In C# there is no equivalent of restrict qualifier from C language.

In the C# ECMA-334:2017 language specification, in chapter 23. Unsafe Code, there is no syntax to specify that a part of the memory must be accessed only by specific pointer. And there is no syntax to specify that memory regions pointed by pointers are not overlapped. Thus there is no such equivalent. This is probably because C# is a managed language, unsafe syntax which allows for working with pointers/unmanaged memory is an edge case in C#. And restrict on pointers would be an edge case of the edge case.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
Renat
  • 7,718
  • 2
  • 20
  • 34
  • It does. But why? It does that in my C example too. How is the combined version that you provided different? –  Apr 24 '21 at 09:54
  • 3
    @Hrant: Because it reads both / all sources before writing anywhere, so aliasing no longer means that `src2[n]` might be a reload of `dst[n]`. (This would be a better answer if it explained that. OTOH, explaining the actual effect of possible aliasing on compiler optimization is a big detour from asking whether C# supports a `restrict` equivalent, other than the usual reading things into local vars or temporaries before other stores. Perhaps find a C or C++ Q&A that explains it with examples so you can remove that side-question from this question) – Peter Cordes Apr 24 '21 at 10:03
  • 1
    @Renat: It's not possible to have two *different* objects overlap, but in the OP's example where the accesses are at the same index, `dst[n]` is the same object as `src2[n]` if called as `F(a, b, a, 123)`. (i.e. **passing the *same* array object twice, as two different args creates aliasing without `unsafe`**.) Except when functions can inline enough to see where their arrays are coming from (e.g. separately allocated), they need to assume aliasing. – Peter Cordes Apr 24 '21 at 12:13
  • 2
    I'd guess that `restrict` wouldn't be worth it because current C# JIT implementations wouldn't take advantage of it. In the OP's example, it's even reloading `dst[n]` for no reason at all. (C compilers writing it before later loads when aliasing is possible, but reuse the known value from a register. It's not `volatile` or `atomic` so they can assume no other thread has changed the value, just like a C# compiler could if it was smart enough.) – Peter Cordes Apr 24 '21 at 12:17
  • @PeterCordes, thank you for mention the passing same pointer twice scenario, i completely missed that. Removed the statement about different objects' memory as it's irrelevant – Renat Apr 24 '21 at 12:20