1

Ok, so I'm trying to model a CLH-RW lock in Promela.

The way the lock works is simple, really:

The queue consists of a tail, to which both readers and writers enqueue a node containing a single bool succ_must_wait they do so by creating a new node and CAS-ing it with the tail.

The tail thereby becomes the node's predecessor, pred.

Then they spin-wait on pred.succ_must_wait until it is false.

Readers first increment a reader counter ncritR and then set their own flag to false, allowing multiple readers at in the critical section at the same time. Releasing a readlock simply means decrementing ncritR again.

Writers wait until ncritR reaches zero, then enter the critical section. They do not set their flag to false until the lock is released.

I'm kind of struggling to model this in promela, though.

My current attempt (see below) tries to make use of arrays, where each node basically consists of a number of array entries.

This fails because let's say A enqueues itself, then B enqueues itself. Then the queue will look like this:

S <- A <- B

Where S is a sentinel node.

The problem now is, that when A runs to completeness and re-enqueues, the queue will look like

S <- A <- B <- A'

In actual execution, this is absolutely fine because A and A' are distinct node objects. And since A.succ_must_wait will have been set to false when A first released the lock, B will eventually make progress, and therefore A' will eventually make progress.

What happens in the array-based promela model below, though, is that A and A' occupy the same array positions, causing B to miss the fact that A has released the lock, thereby creating a deadlock where B is (wrongly) waiting for A' instead of A and A' is waiting (correctly) for B.

A possible "solution" to this could be to have A wait until B acknowledges the release. But that would not be true to how the lock works.

Another "solution" would be to wait for a CHANGE in pred.succ_must_wait, where a release would increment succ_must_wait, rather than reset it to 0.

But I'm intending to model a version of the lock, where pred may change (i.e. where a node may be allowed to disregard some of its predecessors), and I'm not entirely convinced something like the increasing version wouldn't cause an issue with this change.

So what's the "smartest" way to model an implicit queue like this in promela?

/* CLH-RW Lock */
/*pid: 0 = init, 1-2 = reader, 3-4 =  writer*/

ltl liveness{ 
    ([]<> reader[1]@progress_reader)
    && ([]<> reader[2]@progress_reader)
    && ([]<> writer[3]@progress_writer)
    && ([]<> writer[4]@progress_writer)
 }

bool initialised = 0;

byte ncritR;
byte ncritW;
byte tail;
bool succ_must_wait[5]
byte pred[5]

init{
    assert(_pid == 0);
    ncritR = 0;
    ncritW = 0;

    /*sentinel node*/
    tail =0;
    pred[0] = 0;
    succ_must_wait[0] = 0;
    initialised = 1;
}

active [2] proctype reader()
{
    assert(_pid >= 1);
    (initialised == 1)
    do
    :: else ->
        succ_must_wait[_pid] = 1;
        atomic {
            pred[_pid] = tail;
            tail = _pid;
        }

        (succ_must_wait[pred[_pid]] == 0)

        ncritR++;
        succ_must_wait[_pid] = 0;
        atomic {
            /*freeing previous node for garbage collection*/
            pred[_pid] = 0;
        }

        /*CRITICAL SECTION*/
progress_reader:
        assert(ncritR >= 1);
        assert(ncritW == 0);

        ncritR--;

        atomic {
            /*necessary to model the fact that the next access creates a new queue node*/
            if
            :: tail == _pid -> tail = 0;
            :: else ->
            fi
        }
    od
}

active [2] proctype writer()
{
    assert(_pid >= 1);
    (initialised == 1)
    do
    :: else -> 
        succ_must_wait[_pid] = 1;

        atomic {
            pred[_pid] = tail;
            tail = _pid;
        }

        (succ_must_wait[pred[_pid]] == 0)
        (ncritR == 0)
        atomic {
            /*freeing previous node for garbage collection*/
            pred[_pid] = 0;
        }
        ncritW++;

        /* CRITICAL SECTION */
progress_writer:
        assert(ncritR == 0);
        assert(ncritW == 1);

        ncritW--;
        succ_must_wait[_pid] = 0;

        atomic {
            /*necessary to model the fact that the next access creates a new queue node*/
            if
            :: tail == _pid -> tail = 0;
            :: else -> 
            fi
        }
    od
}
Patrick Trentin
  • 7,126
  • 3
  • 23
  • 40
User1291
  • 7,664
  • 8
  • 51
  • 108

1 Answers1

1

First of all, a few notes:

  • You don't need to initialize your variables to 0, since:

    The default initial value of all variables is zero.

    see the docs.

  • You don't need to enclose a single instruction inside an atomic {} statement, since any elementary statement is executed atomically. For better efficiency of the verification process, whenever possible, you should use d_step {} instead. Here you can find a related stackoverflow Q/A on the topic.

  • init {} is guaranteed to have _pid == 0 when one of the two following conditions holds:

    • no active proctype is declared
    • init {} is declared before any other active proctype appearing in the source code

    Active Processes, includig init {}, are spawned in order of appearance inside the source code. All other processes are spawned in order of appearance of the corresponding run ... statement.


I identified the following issues on your model:

  • the instruction pred[_pid] = 0 is useless because that memory location is only read after the assignment pred[_pid] = tail

  • When you release the successor of a node, you set succ_must_wait[_pid] to 0 only and you don't invalidate the node instance onto which your successor is waiting for. This is the problem that you identified in your question, but was unable to solve. The solution I propose is to add the following code:

    pid j;
    for (j: 1..4) {
        if
            :: pred[j] == _pid -> pred[j] = 0;
            :: else -> skip;
        fi
    }
    

    This should be enclosed in an atomic {} block.

  • You correctly set tail back to 0 when you find that the node that has just left the critical section is also the last node in the queue. You also correctly enclose this operation in an atomic {} block. However, it may happen that --when you are about to enter this atomic {} block-- some other process --who was still waiting in some idle state-- decides to execute the initial atomic block and copies the current value of tail --which corresponds to the node that has just expired-- into his own pred[_pid] memory location. If now the node that has just exited the critical section attempts to join it once again, setting his own value of succ_must_wait[_pid] to 1, you will get another instance of circular wait among processes. The correct approach is to merge this part with the code releasing the successor.


The following inline function can be used to release the successor of a given node:

inline release_succ(i)
{
    d_step {
        pid j;
        for (j: 1..4) {
            if
                :: pred[j] == i ->
                    pred[j] = 0;
                :: else ->
                    skip;
            fi
        }
        succ_must_wait[i] = 0;
        if
            :: tail == _pid -> tail = 0;
            :: else -> skip;
        fi
    }
}

The complete model, follows:

byte ncritR;
byte ncritW;
byte tail;
bool succ_must_wait[5];
byte pred[5];

init
{
    skip
}

inline release_succ(i)
{
    d_step {
        pid j;
        for (j: 1..4) {
            if
                :: pred[j] == i ->
                    pred[j] = 0;
                :: else ->
                    skip;
            fi
        }
        succ_must_wait[i] = 0;
        if
            :: tail == _pid -> tail = 0;
            :: else -> skip;
        fi
    }
}

active [2] proctype reader()
{
loop:
    succ_must_wait[_pid] = 1;
    d_step {
        pred[_pid] = tail;
        tail = _pid;
    }

trying:
    (succ_must_wait[pred[_pid]] == 0)

    ncritR++;
    release_succ(_pid);

    // critical section    
progress_reader:
    assert(ncritR > 0);
    assert(ncritW == 0);

    ncritR--;

    goto loop;
}

active [2] proctype writer()
{
loop:
    succ_must_wait[_pid] = 1;

    d_step {
        pred[_pid] = tail;
        tail = _pid;
    }

trying:
    (succ_must_wait[pred[_pid]] == 0) && (ncritR == 0)

    ncritW++;

    // critical section
progress_writer:
    assert(ncritR == 0);
    assert(ncritW == 1);

    ncritW--;

    release_succ(_pid);

    goto loop;
}

I added the following properties to the model:

  • p0: the writer with _pid equal to 4 goes through its progress state infinitely often, provided that it is given the chance to execute some instruction infinitely often:

    ltl p0 {
       ([]<> (_last == 4)) ->
       ([]<> writer[4]@progress_writer)
    };
    

    This property should be true.

  • p1: there is never more than one reader in the critical section:

    ltl p1 {
        ([] (ncritR <= 1))
    };
    

    Obviously, we expect this property to be false in a model that matches your specification.

  • p2: there is never more than one writer in the critical section:

    ltl p2 {
        ([] (ncritW <= 1))
    };
    

    This property should be true.

  • p3: there isn't any node that is the predecessor of two other nodes at the same time, unless such node is node 0:

    ltl p3 {
        [] (
            (((pred[1] != 0) && (pred[2] != 0)) -> (pred[1] != pred[2])) &&
            (((pred[1] != 0) && (pred[3] != 0)) -> (pred[1] != pred[3])) &&
            (((pred[1] != 0) && (pred[4] != 0)) -> (pred[1] != pred[4])) &&
            (((pred[2] != 0) && (pred[3] != 0)) -> (pred[2] != pred[3])) &&
            (((pred[2] != 0) && (pred[4] != 0)) -> (pred[2] != pred[4])) &&
            (((pred[3] != 0) && (pred[4] != 0)) -> (pred[3] != pred[4]))
        )
    };
    

    This property should be true.

  • p4: it is always true that whenever writer with _pid equal to 4 tries to access the critical section then it will eventually get there:

    ltl p4 {
        [] (writer[4]@trying -> <> writer[4]@progress_writer)
    };
    

    This property should be true.

The outcome of the verification matches our expectations:

~$ spin -search -ltl p0 -a clhrw_lock.pml 
...
Full statespace search for:
    never claim             + (p0)
    assertion violations    + (if within scope of claim)
    acceptance   cycles     + (fairness disabled)
    invalid end states      - (disabled by never claim)

State-vector 68 byte, depth reached 3305, errors: 0
...

~$ spin -search -ltl p1 -a clhrw_lock.pml 
...
Full statespace search for:
    never claim             + (p1)
    assertion violations    + (if within scope of claim)
    acceptance   cycles     + (fairness disabled)
    invalid end states      - (disabled by never claim)

State-vector 68 byte, depth reached 1692, errors: 1
...

~$ spin -search -ltl p2 -a clhrw_lock.pml
...
Full statespace search for:
    never claim             + (p2)
    assertion violations    + (if within scope of claim)
    acceptance   cycles     + (fairness disabled)
    invalid end states      - (disabled by never claim)

State-vector 68 byte, depth reached 3115, errors: 0
...

~$ spin -search -ltl p3 -a clhrw_lock.pml 
...
Full statespace search for:
    never claim             + (p3)
    assertion violations    + (if within scope of claim)
    acceptance   cycles     + (fairness disabled)
    invalid end states      - (disabled by never claim)

State-vector 68 byte, depth reached 3115, errors: 0
...

~$ spin -search -ltl p4 -a clhrw_lock.pml 
...
Full statespace search for:
    never claim             + (p4)
    assertion violations    + (if within scope of claim)
    acceptance   cycles     + (fairness disabled)
    invalid end states      - (disabled by never claim)

State-vector 68 byte, depth reached 3115, errors: 0
...
Patrick Trentin
  • 7,126
  • 3
  • 23
  • 40