CPLEX with Python API - how to make model formulating faster?

Question

I turned to CPLEX, since I have quite a big linear problem to solve.
If we use scipy.optimize.linprog notation:

minimize: c^T * x
subject to: A_ub * x <= b_ub and A_eq * x == b_eq,

then my A_ub matrix has a shape of roughly (20000, 10000): 20000 constraints, 10000 variables.

It is very fast to construct matrices A_ub, A_eq and vectors c, b_ub, b_eq using numpy.
But creating a CPLEX problem out of it takes around 30 seconds (which is not acceptable in my situation). This happens, because their Python API cannot take a matrix as an input (at least I could not find such functionality after couple of days of testing different scenarios).
The only way to create a problem is to construct it either column by column or row by row, like this:

problem = cplex.Cplex()
problem.set_problem_type(problem.problem_type.LP)
problem.objective.set_sense(problem.objective.sense.minimize)
problem.variables.add(obj=c)

n_constraints, n_vars = A_ub.shape
index = list(range(n_vars))
list_rhs = list(b_ub)
# for each row (constraint) create a SparsePair instance
sparse_pairs = [cplex.SparsePair(ind=index, val=A_ub[i]) for i in range(n_constraints)]

# this piece takes 30 seconds
problem.linear_constraints.add(
    lin_expr=sparse_pairs,
    rhs=list_rhs,
    senses=['L'] * n_less_cons
)

I also tried to do it column by column and by directly filling coefficients, but everything is equally slow.

I cannot believe it is normal, that formulating a problem takes 6-7 times longer than actually solving the problem (solving takes 4-5 seconds). Does anybody know, whether there is a faster way to create a problem in CPLEX?
Currently, it is faster to solve the problem with cvxopt using open-source GLPK (15 sec), because it directly takes matrices as an input, like scipy.linprog.

P.S. I also checked Gurobi's Python API and it has the same problem (it works even slower).

One thing that will help a little is turning the [datacheck parameter](https://www.ibm.com/support/knowledgecenter/SSSA5P_12.8.0/ilog.odms.cplex.help/CPLEX/Parameters/topics/DataCheck.html) "off". With the CPLEX Python API it is "on" by default. This can be done by adding `problem.parameters.read.datacheck.set(problem.parameters.read.datacheck.values.off)` just after creating `problem`. — rkersh, Jun 08 '18 at 20:24
@rkersh, thank you! This does not solve the problem fully, but at least it decreased the preparation time by roughly 40%. So, now the whole process takes around 22 seconds (down from 30). But it's still slower than open-source GLPK. — Bi0max, Jun 11 '18 at 14:13
By the way, is your matrix completely dense (i.e., there are no zeros)? If it is not, and you can build the sparse arrays directly, you should get better performance. Also, it is marginally faster to build a model by columns rather than by rows. — rkersh, Jun 11 '18 at 15:38
With "build the sparse arrays directly" above, I meant that you could use [scipy.sparse.csr_matrix](https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html) and friends, for example. — rkersh, Jun 11 '18 at 15:47

score 5 · Answer 1 · answered Jun 06 '18 at 09:12

Docplex has a CplexTransformer class (docplex.mp.sktrans.transformers.py) which builds and solves a linear model from a matrix and a cost vector. It accepts numpy matrices, pandas dataframes, and scipys' sparse coo matrices (for very sparse matrices, a coo matrix formulation can really help).

Here is a very small code snippet showing use of CplexTransformer:

# ----- a very small CplexTransformer example
from docplex.mp.sktrans.transformers import CplexTransformer
import scipy.sparse as sp


def solve_cpxtrans_sparse_coo():
    xs = [0, 0, 1, 1, 0, 1]
    ys = [0, 1, 1, 2, 3, 3]
    dd = [1, 1, 1, 1, 5, 7]
    spm = sp.coo_matrix((dd, (xs, ys)), shape=(2, 4))
    ubs = 10
    res = CplexTransformer(sense="min").transform(spm, y=[3, 2, 1], ubs=ubs, sense='ge')
    print(res)
    xs= res['value'].tolist()
    print(xs)

Unfortunately, it seems, that this class is just a nice wrapper. Under the hood, it just again decomposes numpy/scipy/pandas structures in rows, creates the problem, and solves it. Takes the same amount of time, as my implementation without `CplexTransformer`. — Bi0max, Jun 07 '18 at 12:25

score 1 · Answer 2 · answered Feb 28 '19 at 02:02

Suppose A_ub is a sparse matrix,

a_rows = A_ub.row.tolist()
a_cols = A_ub.col.tolist()
a_vals = A_ub.data
list_rhs = list(b_ub)
problem.linear_constraints.add(rhs=list_rhs, senses=['L'] * n_less_cons)
problem.linear_constraints.set_coefficients(zip(a_rows, a_cols, a_vals))

You may try it. It works better in my case.

score 0 · Answer 3 · answered May 03 '19 at 18:07

0

The same issue was discussed here: https://github.com/cvxgrp/cvxpy/issues/617. There seems to be still no solution to this problem.

answered May 03 '19 at 18:07

Jongmmm

148
1
6

Ryan's question about the matrix being fully dense or not was never answered. If the matrix is not fully dense but has a good number of zeros in it, then things will work a lot faster if the SparsePair instances are built without any zeros in them. Creating the matrices as sparse matrices in numpy may make this simpler. – Daniel Junglas May 04 '19 at 21:40

CPLEX with Python API - how to make model formulating faster?

3 Answers3