2

I am working on a large quadratic programming problem. I would like to feed in the Q matrix defining the objective function into IBM's Cplex using Python API. The Q matrix is built using scipy lil matrix because it is sparse. Ideally, I would like to pass the matrix onto Cplex. Does Cplex accept scipy lil matrix?

I can convert the Q to the format of list of lists which Cplex accepts, lets call it qMat. But the size of qMat becomes too large and the machine runs out of memory (even with 120 Gig).

Below is my work in progress code. In the actual problem n is around half a million, and m is around 5 million. In the actual problem Q is given and not randomly assigned as in the problem below.

from __future__ import division
import numpy as np
import cplex
import sys
import random
from scipy import sparse

n = 10
m = 5

def create():
    Q = sparse.lil_matrix((n, n))
    nums = random.sample(range(0, n), m)
    for i in nums:
        for j in nums:
            a = random.uniform(0,1)
            Q[i,j] = a
            Q[j,i] = a
    return Q

def convert(Q):
    qMat = [[[], []] for _ in range(n)]
    for k in xrange(n-1):
        qMat[k][0] = Q.rows[k]
        qMat[k][1] = Q.data[k]
    return qMat

Q = create()
qMat = convert(Q)
my_prob = cplex.Cplex()
my_prob.objective.set_quadratic(qMat)
user58925
  • 1,537
  • 5
  • 19
  • 28
  • No, the CPLEX Python API does not accept scipy lil matrices. [docplex](http://ibmdecisionoptimization.github.io/docplex-doc/index.html) is numpy friendly and would probably accept them, but it sits on top of the CPLEX Python API so the same conversion would have to occur. Ultimately, the input data also has to be converted into native C arrays before it is passed to the underlying CPLEX engine. – rkersh Sep 18 '18 at 21:16
  • As a side note, to get rid of some overhead, you can reduce your `convert` function to `def convert(Q): return [[Q.rows[k], Q.data[k]] for k in range(n-1)]`. – rkersh Sep 18 '18 at 21:19
  • It's too late for me to edit the first comment to say that `docplex` can solve on the cloud, though, in which case the local conversion could potentially be by-passed. A final thought, is that you could potentially write out your model to disk in LP format and then read that in rather than constructing it all at once in memory. – rkersh Sep 18 '18 at 21:26

1 Answers1

1

If n = 500000 and m = 5000000, then that is 2.5e12 non-zeroes. For each of these you'd need roughly one double for the non-zero value and one CPXDIM for the index. That is 8+4=12 bytes per non-zero. This would give:

>>> print(2.5e12 * 12 / 1024. / 1024. / 1024.)
27939.6772385

Roughly, 28TB of memory! It's not clear exactly how many non-zeros you plan on having, but using this calculation you can easily find out whether it is even possible or not to do what you're asking.

As mentioned in the comments, the CPLEX Python API does not accept scipy lil matrices. You could try docplex, which is numpy friendly, or you could even try generating an LP file directly.

Using something like the following is probably your best bet in terms of reducing the conversion overhead (I think I made an off-by-one error in the comments section above):

my_prob.objective.set_quadratic(list(zip(Q.rows, Q.data)))

or

my_prob.objective.set_quadratic([[row, data] for row, data in zip(Q.rows, Q.data)]))

At any rate, you should play with these to see what gives the best performance (in terms of speed and memory).

rkersh
  • 4,447
  • 2
  • 22
  • 31