3

(First question, will edit if not good in some way. Did research prior to posting)

I want to predict x*C=y (x and y are datasets, C is a matrix), with a constraint that the rows of C sum to 1 and that its elements are between 0 and 1.

Because it's the rows that are constrained, not the columns, I can't just use linear regression and have to write down the error function. I did this successfully in Matlab, so I know it's not in the data or method, but probably in my code.

My code (below) gives one of these two errors (depending on the random initial guess, I assume):

More than 3*n iterations in LSQ subproblem    (Exit mode 3)
Inequality constraints incompatible    (Exit mode 4)

Any help would be greatly appreciated. I'm new to Python and spent a lot of time on this.

M1=data_2013.shape[1]
M2=data_2015.shape[1]

def error_function(C):
    C=C.reshape(M1,M2)
    return np.sum(np.sum((np.dot(data_2013,C)-data_2015)**2))

def between_zero_and_one(x):
    x=x.reshape(x.size)
    return x*(1-x)

def eq_constraint(x):
    x=x.reshape(M1,M2)
    return x.sum(axis=1) - 1

cons = [{'type': 'ineq', 'fun': between_zero_and_one}, 
        {'type': 'eq', 'fun': eq_constraint}]


C0=np.random.rand(M1,M2)
result=minimize(error_function,C0, constraints=cons, options={'disp': True, 'maxiter': 10000})
Itamar Mushkin
  • 2,803
  • 2
  • 16
  • 32
  • 2
    Add the data or create reproducible synthetic-data (to show code we can run!). Mathematically it's still unclear to me what you exactly want to do. That being said, both errors seem to be something easily tackled. Furthermore the ```between_zero_and_one``` looks as worse as it gets to me. And one more thing: it's likely that this is a convex-optimization problem and you would be wise to use specialized solvers (e.g. through cvxpy). – sascha Dec 10 '17 at 01:34
  • 1. Is linking to a Kaggle notebook acceptable? It's a lot of data (around 1000-by-30), and the problem does not reproduce on a very small (e.g 2-by-3) dataset. 2. Mathematically, it's like a Markov matrix: I have a lot of state-vectors, each undergoing the (allegedly) same linear transformation that preserves their sum and has only positive values. Does this clarify? 3. Can you elaborate on what's wrong with 'between_zero_and_one'? Would it work better as two different constraints? 4 (and most importantly). It is indeed convex. I didn't know cvxpy, I'll read through it. – Itamar Mushkin Dec 10 '17 at 06:15
  • 1
    Yes. Use variable bounds if supported, or two inequalities for constraining vars between 0,1. – sascha Dec 10 '17 at 12:50
  • `cvxpy` worked and gave a proper result! – Itamar Mushkin Dec 12 '17 at 17:47

1 Answers1

0

Sascha's answer helped me - the problem converged well with cvxpy.

Code:

M1=x_data.shape[1]
M2=y_data.shape[1]
C=cvx.Variable(x_data.shape[1],y_data.shape[1])
constraints=[0<=C, C<=1, cvx.sum_entries(C,axis=1)==1]
objective=cvx.Minimize(cvx.norm((x_data.values*C)-y_data.values))
prob=cvx.Problem(objective, constraints)
prob.solve()
C_mat=C.value

Thanks, Sascha!

Itamar Mushkin
  • 2,803
  • 2
  • 16
  • 32
  • 2
    I recommend using ```cvx.norm(x)``` instead of ```cvx.sum_entries(cvx.square())``` (there is even a dedicated ```cvx.sum_squares()``` function) This will be more robust and does not change the solution-vector. – sascha Dec 12 '17 at 18:15