In the Intel Manual Vol.3 there is an example of loads reordering with early stores.
Initially x = y = 0
Core 1:
mov [x], 1
mov r2, [y]
Core 2:
mov [y], 1
mov r1, [x]
So r1 = r2 = 0
is possible. The question is if requiring acquire-release prohibits such scenario? On x86 store is a release store so I think no. Example:
Core 1:
release(mov [x], 1)
mov r2, [y]
Core 2:
mov [y], 1
acquire(mov r1, [x])
In this case if acquire(mov r1, [x])
loads observe 0 then it's only possible to conclude that release(mov [x], 1)
is not synchronized-with acquire(mov r1, [x])
in terms of the C11 Standard memory model specification standpoint, and it does not provide any guarantees which could prohibit reordering mov [y], 1
and acquire(mov r1, [x])
on the Core 2