In the J. Sorin book "Primer on Memory Consistency and Cache Coherence" I found the next paragaph about RMW optimistion in the SC model:
More aggressive implementations of RMWs leverage the insight that SC requires only the appearance of a total order of all requests. Thus, an atomic RMW can be implemented by first having a core obtain the block in state M in its cache, if the block is not already there in that state. The core then needs to only load and store the block in its cache—without any coherence messages or bus locking—as long as it waits to service any incoming coherence request for the block until after the store. This waiting does not risk deadlock because the store is guaranteed to complete.
Can somebody explain me, what this means? If we have two cores, first of them perform RMW to the A mem block, and the second perform Write request to the A block, what will happen?