2

I have a sframe like:

x = sf.SFrame({'users': [{'123': 1.0, '122': 5},
{'134': 3.0, '123': 10}]})

I want to convert into scipy.sparse csr_matrix without invoking graphlab create, but only using sframe and Python.

How to do it?

Erba Aitbayev
  • 4,167
  • 12
  • 46
  • 81
dvshekar
  • 93
  • 11

1 Answers1

0

Assuming you want the row number to be the row index in the output sparse matrix, the only tricky step is using SFrame.stack - from there you should be able to construct a csr_matrix directly.

import sframe as sf
from scipy.sparse import csr_matrix

x = sf.SFrame({'users': [{'123': 1.0, '122': 5},
                         {'134': 3.0, '123': 10}]})
x = x.add_row_number('row_id')
x = x.stack('users')
A = csr_matrix((x['X3'], (x['row_id'], x['X2'])), 
               shape=(2, 135))

I'm also hard-coding the dimension of the matrix here, but that's probably something you'd want to figure out programmtically.

papayawarrior
  • 1,027
  • 7
  • 10
  • Thanks. This was helpful. But, I still have some problem. I've a user_len = 1444418322. But, the following did not work on python on ubuntu server. df_csr = csr_matrix((df_temp['Total'], (df_temp['ClusterId'], df_temp['UserId'])),shape=(100, users_len+1)) But it did work on redhat C3 cluster. I think it may have to do with the 64 setting for python or some incorrect version of libraries like sklearn. Please advise. – dvshekar Feb 16 '16 at 00:36