2

I'm new to Golang and learning to understand the concurrent programming model. I wrote two concurrent solutions for the N-Queens problem and couldn't find an explanation why the second one is significantly faster than the first(about 25 times faster), though I believe they are almost equivalent.

The N-Queens Problem

LeetCode N-Queens

The First Solution

A semaphore (to control the number of active workers concurrently) and a signal channel (connects workers and the control thread that counts current workers to determine when to exit.) are passed as context to the worker goroutines.

Here is the main goroutine function Solve():

var (
    MAX_THREADS = runtime.NumCPU() * 2
)
// Solve() searches the solution space of n-queens problem in parallel.
func Solve(n int) (ans int, err error) {

    // MAX_PROBLEM_SIZE is 32 because the diagonals need 2n-1 bits to encode while int64 has only 64 bits.
    if n > MAX_PROBLEM_SIZE {
        err = fmt.Errorf("problem size exceeds limit(%d)", MAX_PROBLEM_SIZE)
        return
    }

    workingLevel, _ := getWorkingLevel(n)

    ctx := SolverContext{
        signalChan:   make(chan SolverSignal, MAX_THREADS),      // channel that connects workers and the main thread
        pool:         semaphore.NewWeighted(int64(MAX_THREADS)), // a semaphore limiting the maximum working concurrenly
        workingLevel: workingLevel,                              // on which row the workers are created
    }

    // kickstart the top level worker
    ctx.pool.Acquire(ctx, 1)
    ctx.signalChan <- WORKER_START
    go func() {
        grow(0, 0, 0, 0, n, &ctx)
        ctx.signalChan <- WORKER_FINISH
        ctx.pool.Release(1)
    }()

    worker_num := 0

    // the control thread counts the number of valid solutions and exit when
    // there's no more workers running.
    for s := range ctx.signalChan {
        switch s {
        case WORKER_START:
            worker_num++
        case WORKER_FINISH:
            worker_num--
        case SOLUTION_FOUND:
            ans++
        }

        if worker_num == 0 {
            close(ctx.signalChan)
        }
    }

    return
}

The recursive function grow() tries to create a new worker on the workingLevel to continue the search instead of going deeper down on the same goroutine.

func grow(colBits, slashBits, backslashBits int, row int, n int, ctx *SolverContext) {
    if row == n {
        // means all n queens are placed without meeting each other in any direction.
        ctx.signalChan <- SOLUTION_FOUND
        return
    }

    available := (1<<n - 1) &^ (colBits | slashBits | backslashBits)

    for available != 0 {
        pos := available & (-available)

        growOnNewWorker := func(onNewWorker bool) {
            grow(colBits|pos, (slashBits|pos)<<1, (backslashBits|pos)>>1, row+1, n, ctx)
            if onNewWorker {
                ctx.signalChan <- WORKER_FINISH
                ctx.pool.Release(1)
            }
        }

        if ctx.workingLevel == row+1 {
            // try to create a new worker. blocks until the semaphore is once again acquirable.
            if err := ctx.pool.Acquire(ctx, 1); err == nil {
                ctx.signalChan <- WORKER_START
                go growOnNewWorker(true)
            }
        } else {
            growOnNewWorker(false)
        }

        available &^= pos
    }
}

Following are some graphs that might help:

How grow() searches for possible solutions

On which level my workers are created

On a specific search path, previously taken columns and diagonals are recorded by setting a certain bit of colBits, slashBits or baskslashBits to 1. How to efficiently determine the current (row, col) position is safe to place a queen or not is irrelevant to the question. We can focus on the concurrent part of the code.

The function to find the appropriate working level:

func getWorkingLevel(n int) (level, levelSize int) {
    branchesSum := 1
    for i := 0; i < n; i++ {
        branchesSum *= n

        if branchesSum >= MAX_THREADS {
            level, levelSize = i, branchesSum
            return
        }
    }
    return
}

The Second Solution

A bunch of workers are created in advance and they should be waiting on the taskParamChan for new search tasks. The number of solutions found in each task are send into the ansChan by these workers.

Here's the control thread function Solve():

type cellParams struct {
    colBits       int
    slashBits     int
    bashSlashBits int
    row           int
}

// Solve() searches the solution space of n-th queen problem in parallel.
func Solve(n int) (ans int, err error) {

    // MAX_PROBLEM_SIZE is 32 because the diagonals need 2n-1 bits to encode while uint64 has only 64 bits.
    if n > MAX_PROBLEM_SIZE {
        err = fmt.Errorf("problem size exceeds limit(%d)", MAX_PROBLEM_SIZE)
        return
    }

    workingLevel, levelSize := getWorkingLevel(n)

    // levelSize of tasks will be created in total
    // so we need the buffered channel in this size
    taskParamChan := make(chan cellParams, levelSize)
    ansChan := make(chan int, levelSize)

    // create a (real) pool of workers waiting on new tasks
    for i := 0; i < MAX_THREADS; i++ {
        go solverWorker(taskParamChan, n, ansChan)
    }

    // kickstart the search
    remaining := grow(0, 0, 0, 0, n, &ans, taskParamChan, workingLevel)

    if remaining == 0 {
        return
    }

    for partial := range ansChan {
        ans += partial
        remaining--
        if remaining == 0 {
            close(ansChan)
            close(taskParamChan)
        }
    }

    return
}

The workers:

func solverWorker(taskParamChan chan cellParams, n int, ansChan chan<- int) {
    for v := range taskParamChan {
        partialAns := 0
        // there's no need to sync access to variable partialAns here as only the current worker thread modifies it.
        // return value of grow() is dropped because no new tasks will be created.
        grow(v.colBits, v.slashBits, v.bashSlashBits, v.row, n, &partialAns, nil, 0)
        ansChan <- partialAns
    }
}

Function grow() sends task parameters into paramChan on the workingLevel instead of creating a new goroutine. When paramChan is nil, the function searches in its own goroutine until all the N-Queens solutions down the path are found.

func grow(
    colBits, slashBits, backslashBits int,
    row int, n int, pAns *int,
    paramChan chan cellParams, workingLevel int,
) (paramsSent int) {
    if row == n {
        *pAns++
        return
    }

    available := (1<<n - 1) &^ (colBits | slashBits | backslashBits)
    for available != 0 {
        pos := available & (-available)

        // create new tasks only on the working level
        if paramChan != nil && workingLevel == row+1 {
            paramChan <- cellParams{
                colBits:       colBits | pos,
                slashBits:     (slashBits | pos) << 1,
                bashSlashBits: (backslashBits | pos) >> 1,
                row:           row + 1,
            }
            paramsSent++
        } else {
            paramsSent += grow(colBits|pos, (slashBits|pos)<<1, (backslashBits|pos)>>1,
                row+1, n, pAns,
                paramChan, workingLevel,
            )
        }

        available &^= pos
    }

    return
}

My function to test the N-Queens solver:

func TestSolver(t *testing.T) {
    tests := []struct {
        problemSize int
        want        int
    }{
        {problemSize: 8, want: 92},
        {problemSize: 9, want: 352},
        {problemSize: 10, want: 724},
        {problemSize: 11, want: 2680},
        {problemSize: 12, want: 14200},
        {problemSize: 13, want: 73712},
        {problemSize: 14, want: 365596},
        {problemSize: 15, want: 2279184},
        {problemSize: 16, want: 14772512},
    }

    for _, tt := range tests {
        start := time.Now()
        if res, _ := nqueens.Solve(tt.problemSize); res != tt.want {
            t.Fatalf("got: %d, want: %d", res, tt.want)
        } else {
            t.Logf("Problem_size_%d test got %d in %d ms", tt.problemSize, res, time.Now().UnixMilli()-start.UnixMilli())

        }

    }
}

Test result of the first solution:

=== RUN   TestSolver
    solver_test.go:30: Problem Size 8 got 92 in 0 ms
    solver_test.go:30: Problem Size 9 got 352 in 1 ms
    solver_test.go:30: Problem Size 10 got 724 in 1 ms
    solver_test.go:30: Problem Size 11 got 2680 in 3 ms
    solver_test.go:30: Problem Size 12 got 14200 in 20 ms
    solver_test.go:30: Problem Size 13 got 73712 in 92 ms
    solver_test.go:30: Problem Size 14 got 365596 in 535 ms
    solver_test.go:30: Problem Size 15 got 2279184 in 3292 ms
    solver_test.go:30: Problem Size 16 got 14772512 in 21933 ms
--- PASS: TestSolver (25.88s)

And the second one:

=== RUN   TestSolver
    solver_test.go:30: Problem_size_8 test got 92 in 0 ms
    solver_test.go:30: Problem_size_9 test got 352 in 1 ms
    solver_test.go:30: Problem_size_10 test got 724 in 0 ms
    solver_test.go:30: Problem_size_11 test got 2680 in 0 ms
    solver_test.go:30: Problem_size_12 test got 14200 in 2 ms
    solver_test.go:30: Problem_size_13 test got 73712 in 5 ms
    solver_test.go:30: Problem_size_14 test got 365596 in 28 ms
    solver_test.go:30: Problem_size_15 test got 2279184 in 151 ms
    solver_test.go:30: Problem_size_16 test got 14772512 in 962 ms
--- PASS: TestSolver (1.15s)

I've written a single-threaded implementation which takes only 9.91s to complete the test, and that worries me...

My computer has 16 logical CPUs so the maximum concurrent goroutines running in both solutions are 32(I've set MAX_THREADS = uint64(runtime.NumCPU() * 2)). Given that workers are created dynamically in the first solution, I counted the total amount of workers created during the tests. The number is just n * n as implied in the getWorkingLevel() function, which I think the overhead of managing goroutines is not the major problem(am I wrong?).

What is happening and why the efficiency differs greatly between the two solutions?

Why is the first solution even slower than the single-threaded version?

Shepard
  • 21
  • 3

0 Answers0