I don't understand how can optimistic concurrency be implemented in C++11

Question

I'm trying to implement a protected variable that does not use locks in C++11. I have read a little about optimistic concurrency, but I can't understand how can it be implemented neither in C++ nor in any language.

The way I'm trying to implement the optimistic concurrency is by using a 'last modification id'. The process I'm doing is:

Take a copy of the last modification id.
Modify the protected value.
Compare the local copy of the modification id with the current one.
If the above comparison is true, commit the changes.

The problem I see is that, after comparing the 'last modification ids' (local copy and current one) and before commiting the changes, there is no way to assure that no other threads have modified the value of the protected variable.

Below there is a example of code. Lets suppose that are many threads executing that code and sharing the variable var.

/**
 * This struct is pretended to implement a protected variable,
 * but using optimistic concurrency instead of locks.
 */
struct ProtectedVariable final {

   ProtectedVariable() : var(0), lastModificationId(0){ }

   int getValue() const {
      return var.load();
   }

   void setValue(int val) {
      // This method is not atomic, other thread could change the value
      // of val before being able to increment the 'last modification id'.
      var.store(val);
      lastModificationId.store(lastModificationId.load() + 1);
   }

   size_t getLastModificationId() const {
      return lastModificationId.load();
   }

private:
   std::atomic<int> var;
   std::atomic<size_t> lastModificationId;
};



ProtectedVariable var;


/**
 * Suppose this method writes a value in some sort of database.
 */
int commitChanges(int val){
   // Now, if nobody has changed the value of 'var', commit its value,
   // retry the transaction otherwise.
   if(var.getLastModificationId() == currModifId) {

      // Here is one of the problems. After comparing the value of both Ids, other
      // thread could modify the value of 'var', hence I would be
      // performing the commit with a corrupted value.
      var.setValue(val);

      // Again, the same problem as above.
      writeToDatabase(val);

      // Return 'ok' in case of everything has gone ok.
      return 0;
   } else {
      // If someone has changed the value of var while trying to 
      // calculating and commiting it, return error;
      return -1;
   }
}

/**
 * This method is pretended to be atomic, but without using locks.
 */
void modifyVar(){
   // Get the modification id for checking whether or not some
   // thread has modified the value of 'var' after commiting it.
   size_t currModifId = lastModificationId.load();

   // Get a local copy of 'var'.
   int currVal = var.getValue();

   // Perform some operations basing on the current value of
   // 'var'.
   int newVal = currVal + 1 * 2 / 3;

   if(commitChanges(newVal) != 0){
      // If someone has changed the value of var while trying to 
      // calculating and commiting it, retry the transaction.
      modifyVar();
   }
}

I know that the above code is buggy, but I don't understand how to implement something like the above in a correct way, without bugs.

`int newVal = currVal + 1 * 2 / 3;` is equivalent to `int newVal = currVal;` because integer mul div `1 * 2 / 3` yields zero. — user7860670, Mar 26 '18 at 06:48
And databases which use optimistic concurrency also use locks internally. — Yola, Mar 26 '18 at 07:08
@Yola So, optimistic concurrency is only valid in very specific cases, isn't it? Which are those cases? — Dan, Mar 26 '18 at 10:07

Frax · Accepted Answer · 2020-10-08T11:27:12.087

Optimistic concurrency doesn't mean that you don't use the locks, it merely means that you don't keep the locks during most of the operation.

The idea is that you split your modification into three parts:

Initialization, like getting the lastModificationId. This part may need locks, but not necessarily.
Actual computation. All expensive or blocking code goes here (including any disk writes or network code). The results are written in such a way that they not obscure previous version. The likely way it works is by storing the new values next to the old ones, indexed by not-yet-commited version.
Atomic commit. This part is locked, and must be short, simple, and non blocking. The likely way it works is that it just bumps the version number - after confirming, that there was no other version commited in the meantime. No database writes at this stage.

The main assumption here is that computation part is much more expensive that the commit part. If your modification is trivial and the computation cheap, then you can just use a lock, which is much simpler.

Some example code structured into these 3 parts could look like this:

struct Data {
  ...
}

...

std::mutex lock;
volatile const Data* value;  // The protected data
volatile int current_value_version = 0;

...

bool modifyProtectedValue() {
  // Initialize.
  int version_on_entry = current_value_version;

  // Compute the new value, using the current value.
  // We don't have any lock here, so it's fine to make heavy
  // computations or block on I/O.
  Data* new_value = new Data;
  compute_new_value(value, new_value);

  // Commit or fail.
  bool success;
  lock.lock();
  if (current_value_version == version_on_entry) {
    value = new_value;
    current_value_version++;
    success = true;
  } else {
    success = false;
  }
  lock.unlock();
  
  // Roll back in case of failure.
  if (!success) {
    delete new_value;
  }

  // Inform caller about success or failure.
  return success;
}

// It's cleaner to keep retry logic separately.
bool retryModification(int retries = 5) {
  for (int i = 0; i < retries; ++i) {
    if (modifyProtectedValue()) {
      return true;
    }
  }
  return false;
}

This is a very basic approach, and especially the rollback is trivial. In real world example re-creating the whole Data object (or it's counterpart) would be likely infeasible, so the versioning would have to be done somewhere inside, and the rollback could be much more complex. But I hope it shows the general idea.

For those reading this response, `volatile` in C++ (and C) has nothing to do with concurrency (unlike Java). — Paul J. Lucas, Oct 07 '20 at 16:37
Also, computing the new value based on the current value where the latter is non-trivial (which is implied by the `struct`) without a lock is also a bad idea since different parts of the `struct` may have been updated while the computation is progressing. — Paul J. Lucas, Oct 07 '20 at 16:44
@PaulJLucas It has everything to do with concurrency, also when used for device communication. For multi-threading modern C++ has std::atomic, which has much stronger guarantees and replaces volatile, but that is not important four this example. — Frax, Oct 08 '20 at 07:46
The second comment: the whole point is that the struct is not modified without the locking. I have to admit that the code in my answer has some flaws: it doesn't guarantee atomicity of multi-byte reads (the version numer and current value pointer) and it leaks memory (no garbage collection for old versions), but that was for the sake of simplicity. — Frax, Oct 08 '20 at 07:52
Actually, I should probably explain the importance of `volatile` (or `std::atomic`) here: without `volatile` keyword the compiler would be free to assume that the variable is not changed by any other code while `modifyProtectedValue()` is executing. With this assumption, the check `current_value_version == version_on_entry` always return true (because we have just assigned `int version_on_entry = current_value_version;`), so the compiler would remove the whole if statement and leave just the body of the then branch, rendering the whole check useless. — Frax, Oct 08 '20 at 11:31

mipnw · Answer 2 · 2018-03-26T07:04:25.163

0

If I understand your question, you mean to make sure var and lastModificationId are either both changed, or neither is.

Why not use std::atomic<T> where T would be structure that hold both the int and the size_t?

struct VarWithModificationId {
  int var;
  size_t lastModificationId;
};

class ProtectedVariable {
  private std::atomic<VarWithModificationId> protectedVar;

  // Add your public setter/getter methods here
  // You should be guaranteed that if two threads access protectedVar, they'll each get a 'consistent' view of that variable, but the setter will need to use a lock
};

edited Mar 26 '18 at 07:04

answered Mar 26 '18 at 06:56

mipnw

2,135
2
20
46

That's not going to be lock-less. – MSalters Mar 26 '18 at 06:57
Yes, but actually I would like to understand how is std::atomic implemented in that case, because I know that it uses hardware-specific support for primitive types, but how can it work with 'complex' data structures? – Dan Mar 26 '18 at 06:59
@Dan: As I commented, by using a lock. You'll find that `protectedVar.is_lock_free()` returns `false` in this example. – MSalters Mar 26 '18 at 07:04

score 0 · Answer 3 · answered Mar 26 '18 at 07:03

0

The key here is acquire-release semantics and test-and-increment. Acquire-release semantics are how you enforce an order of operations. Test-and-increment is how you choose which thread wins in case of a race.

Your problem therefore is the .store(lastModificationId+1). You'll need .fetch_add(1). It returns the old value. If that's not the expected value (from before your read), then you lost the race and retry.

answered Mar 26 '18 at 07:03

MSalters

173,980
10
155
350

You are right, that solves the problem in the method `setValue`, but `modifyVar` is still not a secure transaction, is it? – Dan Mar 26 '18 at 07:42
That does solve the incrementation, but not the rest of the issues. Also, fetch_add is actually a lock, just low-level one. – Frax Mar 26 '18 at 08:17
@Dan: In `modifyVar`, you'd use the same approach (which in fact makes the `lastModificationId` redundant. You calculate `newValue`, and use `.compare_exchange_strong(oldValue, newValue)`. – MSalters Mar 26 '18 at 10:58

score 0 · Answer 4 · answered Mar 26 '18 at 10:16

Оptimistic concurrency is used in database engines when it's expected that different users will access the same data rarely. It could go like this:

First user reads data and timestamp. Users handles the data for some time, user checks if the timestamp in the DB hasn't changes since he read the data, if it doesn't then user updates the data and the timestamp.

But, internally DB-engine uses locks for update anyway, during this lock it checks if timestamp has been changed and if it hasn't been, engine updates the data. Just time for which data is locked smaller than with pessimistic concurrency. And you also need to use some kind of locking.

I don't understand how can optimistic concurrency be implemented in C++11

4 Answers4