0

Here is a demo which demonstrates the problem with not having transaction "locking". It sort of simulates async / concurrency using setTimeout. I have never dealt with concurrency in languages like C, Go, or Rust, so I am not really sure how it works in implementation detail, but I am trying to grasp the concept of MVCC.

const db = {
  table1: {
    records: [
      { id: 1, name: 'foo', other: 'hello' },
      { id: 2, name: 'bar', other: 'world' },
    ]
  }
}

function readTable1(id) {
  return db.table1.records.find(x => x.id === id)
}

function writeTable1(id) {
  const record = readTable1(id)
  return new Promise((res, rej) => {
    console.log('transaction 1 start')
    setTimeout(() => {
      record.other = 'qwerty'

      setTimeout(() => {
        record.name = 'asdf'
        console.log('transaction 1 done')
        res()
      }, 1000)
    }, 1000)
  })
}

function wait(ms) {
  return new Promise((res) => setTimeout(res, ms))
}

async function test1() {
  writeTable1(1)
  console.log(readTable1(1))
  await wait(1100)
  console.log(readTable1(1))
  await wait(2200)
  console.log(readTable1(1))
}

test1()

It logs

transaction 1 start
{ id: 1, name: 'foo', other: 'hello' } // read
{ id: 1, name: 'foo', other: 'qwerty' } // read
transaction 1 done
{ id: 1, name: 'asdf', other: 'qwerty' } // read

In the middle while the transaction is processing the record, it changes the real record which can be concurrently read. There are no locks on it, or however MVCC does it without locks (using multiple versions of records). I next try to implement how I think MVCC works, with hopes that you can correct my understanding. Here is that.

const db = {
  table1: {
    records: [
      [{ id: 1, name: 'foo', other: 'hello' }],
      [{ id: 2, name: 'bar', other: 'world' }],
    ]
  }
}

function readTable1(id) {
  const idx = db.table1.records.findIndex(x => x[0].id === id)
  return [idx, db.table1.records[idx][0]]
}

// this is a long transaction.
function writeTable1(id) {
  const [idx, record] = readTable1(id)
  // create a new version of record for transaction to act on.
  const newRecordVersion = {}
  Object.keys(record).forEach(key => newRecordVersion[key] = record[key])
  db.table1.records[idx].push(newRecordVersion)

  return new Promise((res, rej) => {
    console.log('transaction 2 start')
    setTimeout(() => {
      newRecordVersion.other = 'qwerty'

      setTimeout(() => {
        newRecordVersion.name = 'asdf'
        console.log('transaction 2 done')

        // now "commit" the changes
        commit()

        res();
      }, 1000)
    }, 1000)
  })

  function commit() {
    db.table1.records[idx].shift()
  }
}

function wait(ms) {
  return new Promise((res) => setTimeout(res, ms))
}

async function test1() {
  writeTable1(1)
  console.log(readTable1(1)[1])
  await wait(1100)
  console.log(readTable1(1)[1])
  await wait(2200)
  console.log(readTable1(1)[1])
  console.log(db.table1.records)
}

test1()

That outputs this, which seems correct.

transaction 2 start
{ id: 1, name: 'foo', other: 'hello' }
{ id: 1, name: 'foo', other: 'hello' }
transaction 2 done
{ id: 1, name: 'asdf', other: 'qwerty' }
[
  [ { id: 1, name: 'asdf', other: 'qwerty' } ],
  [ { id: 2, name: 'bar', other: 'world' } ]
]

Is this correct, generally how it works? Mainly, how many versions per record are created in a real implementation? Can there be more than 2 versions at a time? If so, in what situations does that occur generally speaking? And how do the timestamps work? I read about the timestamps on the wiki page, but it doesn't really register to me how to implement it. Also the incrementing transaction IDs. So basically how those 3 pieces fit together (versioning, timestamps, and transaction IDs).

I am looking for some sort of simulation of the timestamps and versioning in JavaScript, so I can make sure I understand the general concepts at a high level, yet at a sort of rough approximation of an implementation level. Just knowing what MVCC is and reading a few papers isn't enough in the weeds to know how to implement it.

In my example there will only ever be 2 versions of a record during a transaction. I am not sure if there are cases where you would need more than that. And I am not sure how to plug in the timestamps.

Lance
  • 75,200
  • 93
  • 289
  • 503

1 Answers1

0

Short answer: Multiversion concurrency control in "databases" involves several different things; different vendors implement each of these things in several different ways.

https://dba.stackexchange.com/questions/174791/does-sql-server-use-multiversion-concurrency-control-mvcc

Does SQL Server really implement MVCC anywhere

Yes, since SQL Server 2005.

The SQL Server terminology is "row-versioning isolation levels". See the product documentation tree starting at Locking and Row Versioning. Note in particular that there are two separate "MVCC" implementations, read committed isolation using row versioning (RCSI) and snapshot isolation (SI).

and how does that reconcile with the idea of with (tablock, holdlock) if it does?

Using that combination of hints serializes access to whole table(s). It is the least concurrent option available, so using these hints should be very rare. Whether a particular use could be replaced with RCSI or SI isolation depends on the specific circumstances. You could ask a follow-up question with a specific example if you want us to address that aspect in detail.

You might also like to read my series of articles on SQL Server isolation levels.

It's a big topic, with no short, simple "one size fits all" reply.

As far as "simulating MVCC in JavaScript":

  • Javascript is inherently single-threaded; there is no concept of "threads" or "locking" in the language itself.
  • "Promises" (as you're using) are an excellent way to "order" asynchronous callbacks
  • We'd need more details on exactly what you're trying to accomplish.
FoggyDay
  • 11,962
  • 4
  • 34
  • 48
  • I'm trying to understand the implementation details of MVCC, and am using JavaScript (the language I am most familiar with) to be introduced to the topic. I know promised can be ordered, but I intentionally did it this way to simulate the multithreads in a way. I want to understand the implementation details so I can implement a database (too big of a topic to ask about). I tried to implement row versioning, hence my question about that, am I on the right track. I am not asking what MVCC is or what dbs use it. – Lance Feb 13 '22 at 02:39
  • Mainly I am wondering how the 3 things work: timestamp, versions, and the transaction ID, at an implementation sort of level. – Lance Feb 13 '22 at 02:55
  • One of the main points I was trying to make: MVCC doesn't prescribe any specific implementation. Suggestions: 1) think hard about what "requirements" you'd like for your database. Be sure to consider issues like how "asynchronous I/O" might affect "data integrity", and whether or not you need to support "transactions". 2) Read the links I cited, and see if they give you any good "design ideas" for your particular scenario. – FoggyDay Feb 14 '22 at 02:33