How to implement simple MVCC datastructure

Question

I am reading about different concurrency models and different features of concurrency, but no text talks about how to implement a simple MVCC data structure. Let us say I have to implement a simple Array based data structure which provided MVCC based concurrency. How should my code look like?

I understand that MVCC basically means: (Multiversion concurrency control)

1) Read isolation - your writes should not block reads

2) Timestamp based ordering for establishing happens-before relation/ordering.

Do I need to keep in mind any other aspects?

Also, my below code handles 1st requirement, but how to implement timestamp ordering?

class MVCCArray{

    private int[] arr;

    MVCCArray(int n){
        arr = new int[n];
    }


    //unblocking reads
    public int getItem(int index){
        return arr[index];
    }

    //blocking writes
    public synchronized void setItem(int index, int value){
        arr[index]=value;
    }

}

PS : I want to understand how it is implemented in a generic way. Please refrain from explaining how it is implemented in a particular database.

MVCC is not worth much outside of a transactional context (like a database). The idea is that your writes don't replace the existing data, they create a copy and make the changes there, so reads in concurrent transactions see the old data. — cHao, Aug 25 '16 at 22:24
@cHao: Thanks for your comments. I understand that reads might see old data. I want to understand how to implement this in code. — Rajesh Pantula, Aug 25 '16 at 23:32
That's the thing. Not might. _Do_. This is one of the design goals of MVCC: that transactions already in progress before the write see the data as it was before it was written, especially if the transaction that's modifying the data hasn't been committed yet. You're nowhere close to MVCC while you're modifying data in-place. — cHao, Aug 26 '16 at 00:44
do you mean..having a setter method like in my above code is not how MVCC should be implemented? Is there any sample code or implementation which I can refer as an example? — Rajesh Pantula, Aug 26 '16 at 08:40
Sure, there's implementations out there. In PostgreSQL, for example. But you don't want database-specific explanations, and i can't think of a toy implementation like the one you apparently want. MVCC is quite a bit more complex than you think it is, and if anyone goes through the effort of covering it in any significant detail in a post on SO, i'll be very surprised. — cHao, Aug 26 '16 at 13:54
Take a look at https://en.wikipedia.org/wiki/Multiversion_concurrency_control for more info and some links to resources. — cHao, Aug 26 '16 at 14:01

Samuel Squire · Answer 1 · 2022-05-15T05:49:54.597

I wrote a multiversion concurrency implementation in a simulation. See the simulation runner. My simulation simulates 100 threads all trying to read and write two numbers, A and B. They want to increment the number by 1. We set A to 1 and B to 2 at the beginning of the simulation.

The desired outcome is that A and B should be set to 101 and 102 at the end of the simulation. This can only happen if there is locking or serialization due to multiversion concurrency control. If you didn't have concurrency control or locking, this number will be less than 101 and 102 due to data races.

When a thread reads A or B we iterate over versions of key A or B to see if there is a version that is <= transaction.getTimestamp() and committed.get(key) == that version. If successful, it sets the read timestamp of that value as the transaction that last read that value. rts.put("A", transaction)

At commit time, we check that the rts.get("A").getTimestamp() != committingTransaction.getTimestamp(). If this check is true, we abort the transaction and try again.

We also check if someone committed since the transaction began - we don't want to overwrite their commit.

We also check for each write that the other writing transaction is younger than us then we abort. The if statement is in a method called shouldRestart and this is called on reads and at commit time and on all transactions that touched a value.

public boolean shouldRestart(Transaction transaction, Transaction peek) {
    boolean defeated =  (((peek.getTimestamp() < transaction.getTimestamp() ||
            (transaction.getNumberOfAttempts() < peek.getNumberOfAttempts())) && peek.getPrecommit()) ||
            peek.getPrecommit() && (peek.getTimestamp() > transaction.getTimestamp() ||
                    (peek.getNumberOfAttempts() > transaction.getNumberOfAttempts() && peek.getPrecommit())
                    && !peek.getRestart()));

    return defeated;
}

see the code here The or && peek.getPrecommit() means that a younger transaction can abort if a later transaction gets ahead and the later transaction hasn't been restarted (aborted) Precommit occurs at the beginning of a commit.

During a read of a key we check the RTS to see if it is lower than the reading than our transaction. If so, we abort the transaction and restart - someone is ahead of us in the queue and they need to commit.

On average, the system reaches 101 and 102 after around < 300 transaction aborts. With many runs finishing well below 200 attempts.

EDIT: I changed the formula for calculating which transactions wins. So if another transactions is younger or the other transactions has a higher number of attempts, the current transactions aborts. This reduces the number of attempts.

EDIT: the reason there was high abort counts was that a committing thread would be starved by reading threads that would abort restart due to the committing thread. I added a Thread.yield when a read fails due to an ahead transaction, this reduces restart counts to <200.

How to implement simple MVCC datastructure

1 Answers1