2

I'm writing some program in Go, I get this crash:

fatal error: concurrent map read and map write


consensus/bft.(*ConsensusManager).getHeightManager(0xc42009a7e0, 0x37, 0x0)

consensus/bft/bft_manager.go:246 +0x9b fp=0xc42b033258 sp=0xc42b033208 pc=0xaf1d7b
consensus/bft.(*HeightManager).Round(...)
consensus/bft/bft_manager.go:239
consensus/bft.(*ConsensusManager).Round(0xc42009a7e0, 0x37)

This is my code

type ConsensusManager struct {
    pm                      *ProtocolManager
    chain                   *core.BlockChain
    coinbase                common.Address
    readyValidators         map[common.Address]struct{}
    trackedProtocolFailures []string
    heights                 map[uint64]*HeightManager
    blockCandidates         map[common.Hash]btypes.Proposal

    currentBlock *types.Block
    found        chan *types.Block

    mu          sync.Mutex
    writeMapMu  sync.RWMutex
    getHeightMu sync.RWMutex
    processMu sync.Mutex
}
func (cm *ConsensusManager) Round() uint64 {
    return cm.getHeightManager(cm.Height()).Round()
}

func (cm *ConsensusManager) getHeightManager(h uint64) *HeightManager {

    if _, ok := cm.heights[h]; !ok {
        cm.heights[h] = NewHeightManager(cm, h)
    }

    return cm.heights[h]
}

I try to Use RWMutex to Luck, but the code can't work

func (cm *ConsensusManager) Round() uint64 {
        cm.getHeightMu.Lock()
    defer cm.getHeightMu.Unlock()

    return cm.getHeightManager(cm.Height()).Round()
}

func (cm *ConsensusManager) getHeightManager(h uint64) *HeightManager {
        cm.getHeightMu.Lock()
    defer cm.getHeightMu.Unlock()

        if _, ok := cm.heights[h]; !ok {
        cm.heights[h] = NewHeightManager(cm, h)
    }

    return cm.heights[h]
}

What's wrong with my solutions?

icza
  • 389,944
  • 63
  • 907
  • 827
frank
  • 41
  • 1
  • 3

1 Answers1

4

In Round() you lock the getHeightMu mutex, and you call getHeightManager(). In which you again try to lock the same mutex.

Go's mutex is not reentrant, which means it cannot be locked again by the same goroutine if it is already locked. For details see Recursive locking in Go.

Attempting to lock an already locked mutex is a blocking operation. It will block as long as the mutex gets unlocked and your goroutine is the lucky one that gets to lock it again (if other goroutines are also waiting on it). But the unlock would only come when Round() returns which requires getHeightManager() to complete, which will never happen. It is a deadlock.

You only need to lock the mutex in one place, close to were you access it. So only use locking inside getHeightManager(), and remove the locking from Round() (it doesn't access the protected resource anyway).

Also depending how frequently new height managers are created and added to the map, it may be profitable to use a sync.RWMutex. You could first lock it for reading only, and if the height manager already exists, you can return it. The good side of this is that multiple readers could access it simultaneously without blocking each other. Only if you find the height manager does not exist yet would you have to acquire a write lock.

icza
  • 389,944
  • 63
  • 907
  • 827