Multiple threads accessing one variable

Question

I found this question in a textbook I am reading. The solution is given below it as well. I'm having trouble understanding how the minimum could be 2. Why couldn't a thread read 0, all other threads execute and it writes 1? And whether it is 1 or 2, the thread writing last must still complete its own loop?

int n = 0;
int main(int argc, char **argv) {
 for (i = 0; i < 5; i++) {
 int tmp = n;
 tmp = tmp + 1;
 n = tmp;
 }
 return 0;
}

If a single thread ran this application, you would expect the final output to be 5. What if 5 threads ran the same loop in parallel? What are the largest and smallest values n could have? The largest should be selfevident: 25, with 5 increments from 5 threads. However, reasoning about the smallest possible value is more difficult. Hint: n can be less than 5, but it is up to you to figure out why.

Solution:

With five threads running this five-iteration loop and with no protection from concurrent accesses, the lowest value that n can reach is two. Understanding how to reach this result is easiest when working backwards from the final result. For the final output to be two, a thread must have read a value of one from n, incremented it, and then written two. That means that another thread wrote one, implying that it also initially read zero (which is also the starting value for n). This accounts for the behavior of two of the five threads. However, for this behavior to occur the results of the other three threads must have been overwritten. Two valid executions could accomplish this. Either 1) all three threads began and completed execution between the first thread reading zero and writing one, or 2) all three threads began and completed execution between the final thread reading one and writing two. Both execution orderings are valid.

Well this just causes undefined behaviour, in C11 threads, since there is no memory fence or atomics in use. — M.M, Jul 10 '15 at 23:14
@MattMcNabb Yes. I should have included the first part as well. "You have seen that unsafe accesses from multiple threads cause unpredictable results. But you have also seen that some guarantees can be extracted from unsafe accesses (that is, if every thread writes a 1 then the final value cannot magically be something else). Consider the following code snippet:" — John, Jul 10 '15 at 23:17
@UserNotDefined: Absolutely false. If every thread writes a 1, and there is a race condition causing undefined behaviour, the result can be anything. Your application can just crash. — gnasher729, Jul 10 '15 at 23:33

Arkku · Accepted Answer · 2015-08-09T23:32:04.747

Assuming every thread has a local i (i.e., every thread will run for 5 iterations no matter what), let's try to get 1 as the result. This would mean the last thread to write a value would have to read 0 for n on its 5th iteration. The only way this could happen is if no thread has yet written to n at the start of that thread's 5th iteration, yet for that thread to be on its 5th iteration that thread itself must have written to n, hence it is not possible.

Thus the smallest possible result is 2, which can occur, e.g., as follows: the last thread to write n has completed 4 iterations, then another thread writes 1, the last thread reads the 1 at the start of its 5th iteration, all other threads complete all their iterations before the last thread, and finally the last thread completes its 5th iteration writing the 2.

Disclaimer: I am answering the conceptual question about multithreading – as others have pointed out, the lack of atomicity might lead to undefined behaviour and arbitrary results if the C code presented were used as is. Based on the question's “self-evident” largest number case I'm guessing the textbook's author either doesn't realise this, or is using a C-like pseudo code to illustrate the concept. If the former, then the correct answer would be that the book is wrong, but I think the answer in the latter case is also educational.

Disclaimer: I am answering the conceptual question about multithreading – as others have pointed out, the non-atomicity of writes might lead to undefined behaviour and arbitrary results. Based on the question's “self-evident” largest number case I'm guessing the textbook's author either doesn't realise this, or is using a C-like pseudo code to illustrate the concept. — Arkku, Jul 10 '15 at 23:41
As an additional note, the assumption I make that `i` be local to each thread is necessary for the textbook's answer to be correct even if undefined behaviour and atomicity issues are ignored. If `i` were shared, then a thread could run for only one iteration (other threads having incremented `i` to make the loop condition false) and set `n = 1`. Hence it seems safe to assume that the textbook's author intended a non-shared `i`. — Arkku, Jul 11 '15 at 00:10
Explanation for downvote? I feel I've explained the assumptions I've made, and if someone is downvoting because I'm “answering an exercise”, note that the OP already had the answer from their textbook… — Arkku, Aug 09 '15 at 23:20

Riptyde4 · Answer 2 · 2015-07-10T23:14:37.387

2

Just some insight to add on: Adding, subtracting, etc in C using the + operator is more than just 1 operation. Down in assembly level the + operation is composed of multiple instructions. If multiple threads were to be accessing one variable and there is a bad interleaving of these instructions, the end result could be a horribly incorrect result -> this is another reason why we need things like mutexes, semaphores, and condition variables.

edited Jul 10 '15 at 23:14

answered Jul 10 '15 at 23:14

Riptyde4

5,134
8
30
57

This s a C question, not C++ – M.M Jul 10 '15 at 23:14
@MattMcNabb Corrected, but still true for C. – Riptyde4 Jul 10 '15 at 23:14
1

Adding is a single operation, but loads and stores are separate (even if apparently encoded in a single instruction on a CISC) – o11c Jul 10 '15 at 23:21
@o11c Where in the C standard do I find that? Or do you mean it happens to be on some platform(s)? – David Schwartz Jul 12 '15 at 04:19
@DavidSchwartz It's down at assembly level in every platform – Riptyde4 Aug 06 '15 at 14:55
@o11c I was referring to the add at C++ level and just simply meant to say that a C++ add operation translates to multiple assembly instructions which in turn are all at risk for interleaving poorly – Riptyde4 Aug 06 '15 at 14:56
@Riptyde4 But almost every C++ operation translates into multiple assembly instructions, and for almost everything, we can ignore this. So this answer just creates an even more complex and harder to answer question -- why does this matter for these operations and not for any other? An answer that just poses a more complicated question is just not a helpful answer. This question is based on misunderstandings and this answer just furthers them, IMO. – David Schwartz Aug 09 '15 at 18:27

David Schwartz · Answer 3 · 2015-07-11T15:46:15.817

2

The largest should be selfevident: 25, with 5 increments from 5 threads.

Totally and completely wrong. Whatever said this should not ever be listened to (at least about things involving threading), period.

 int tmp = n;
 tmp = tmp + 1;
 n = tmp;

Imagine a CPU that had no increment operation, but had an efficient "add 10" operation and an efficient "subtract nine" operation. On such a CPU, tmp = tmp + 1; could be optimized to tmp += 10; tmp -= 9;. The compiler could also optimize out tmp entirely by operating on n.

So this code could become the equivalent of:

n += 10;
n -= 9;

Now imagine this happens: All five threads add 10, so n is now 50. The first thread reads the 50, the other four threads subtract 9. The first thread subtracts 9 from the 50 it read and writes 41. So when all is done, n is 41.

So what is claimed to be self-evident is utterly false. Whoever wrote that doesn't understand threading in C.

if every thread writes a 1 then the final value cannot magically be something else

Also utterly and completely false. Consider a CPU that writes a 1 by first writing a 0 and then incrementing the value. If this happens on two cores, the final result could be 2. This textbook was written by someone who fundamentally doesn't understand threading and undefined behavior.

(I'm assuming this textbook isn't limited to some special context in which what it's saying is true. For example, it might be using "C-like" code as a form of platform-neutral assembly language and it might be making assumptions about platforms in which aligned integers have specific guarantees. But if that's so, what it's teaching does not translate to C code at all and would only apply to people writing assembly code on CPUs whose rules match the textbook's assumptions.)

edited Jul 11 '15 at 15:46

answered Jul 10 '15 at 23:17

David Schwartz

179,497
17
214
278

Even it is optimized by the compiler, could not the threads still run serially at runtime and end with n=25, i.e. (1+1+1+1+1)*5? – John Jul 10 '15 at 23:20
Right, but the point is that the maximum is *not* self-evidently 25. Even with just five threads running that loop once, `n` could be 41! – David Schwartz Jul 10 '15 at 23:35
Even barring any compiler optimizations? – John Jul 10 '15 at 23:38
1

@UserNotDefined Yes, even barring any compiler optimizations, because any optimization the compiler could do could be done by the CPU or some other component of the system. (You either have a guarantee or you don't, you can't synthesize a guarantee by ruling out the things you can imagine that would violate the guarantee.) – David Schwartz Jul 10 '15 at 23:39
@UserNotDefined Also because the idea that the compiler must naively convert the C code into the closest imaginable corresponding assembly code or it's "optimizing" it is just false. There might be a CPU that has no increment operation at all, so making the increment out of smaller operations would be necessity, not optimization. This is C code, and it means what it means, what is required to happen. – David Schwartz Jul 10 '15 at 23:45
I'd really appreciate if the downvoters would explain the downvotes. If I'm unclear about something, I'd like to clarify it. If I'm wrong about something, I'd like to correct it. – David Schwartz Jul 12 '15 at 04:18

score 0 · Answer 4 · answered Jul 10 '15 at 23:08

0

The point is that the thread is sharing the same instance of data. Also, it seems to be assumed that all the other threads run at the same rate of execution.

Therefore as each thread rounds the loop (getting to the i++ part of the for), they all increment i nearly simultaneously, so it is as if the code were written:

 for (i = 0; i < 5; i++, i++, i++, i++, i++)
    ...

at least in the extreme case which gives the minimum number of iterations.

answered Jul 10 '15 at 23:08

wallyk

56,922
16
83
148

But with the minimum number of iterations, to me it seems like the answer would 5 though. Can you explain why it winds up being 2? – John Jul 10 '15 at 23:25
@UserNotDefined: The steps are, `i = 0` (first iteration), then `i=5`. So it could actually execute only one iteration. I think the answer of `2` is not correct. – wallyk Jul 10 '15 at 23:27

Multiple threads accessing one variable

4 Answers4

Linked