I am trying to solve the classic graph coloring problem using python PuLP. We have n nodes, a collection of edges in the form edges = [(node1, node2), (node2, node4), ...]
, and we are trying to find the minimum number of node colors so that no connected nodes share a color.
My implementation works, but is slow. It is made of three constraints, plus the one optimization of initializing node0 to color 0 to somewhat limit the search space. The code is as follows:
nodes = range(node_count)
n_colors = 10
# colors = range(node_count)
colors = range(n_colors)
prob = LpProblem("coloring", LpMinimize)
# variable xnc shows if node n has color c
xnc = LpVariable.dicts("x", (nodes, colors), cat='Binary')
# array of colors to indicate which ones were used
used_colors = LpVariable.dicts("used", colors, cat='Binary')
# minimize how many colors are used, and minimize int value for those colors
prob += lpSum([used_colors[c] * c for c in colors])
# prob += lpSum([used_colors[c] for c in colors])
# set the first node to color 0 to constrain starting point
prob += xnc[0][0] == 1
# Every node uses one color
for n in nodes:
prob += lpSum([xnc[n][c] for c in colors]) == 1
# Any connected nodes have different colors
for e in edges:
e1, e2 = e[0], e[1]
for c in colors:
prob += xnc[e1][c] + xnc[e2][c] <= 1
# mark color as used if node has that color
for n in nodes:
for c in colors:
prob += xnc[n][c] <= used_colors[c]
prob.solve()
I see that there are symmetries, and I know I could reduce this by making any new color used at most max(colors_already_used) + 1
, so that if node 0 is color 0, node 1 will either have the same color, or color 1. But I am not sure how to encode this because max
is not allowed the linear nature of the problem in PuLP as far as I know. I achieve a similar effect above by multiplying all colors used by their integer values, which speeds things up a bit but I do not think works as quite the efficient/deterministic constraint I seek.
Also limiting the number of colors seems to have a nice effect on the speed, but I am not sure if it is worth the preprocessing cost to try and find a heuristic before starting the optimization, since it is not clear how many colors will be needed in advance.
What other constraints could I add, or other ways I could speed it up? I am mostly interested in better ways to formulate the problem, but also open to computational optimizations ie parallelization, if they can be done in PuLP.