5

I have a pandas dataframe Object, containing a column with a bag of words representation of text stored as 29881x23947 sparse matrix of type ''. The column was processed by using sklearn and the fit_transform() function.

I now want to tranform this Colum into a 2D Tensor with the convert_to_tensor() function.

x_train_tensor = tf.convert_to_tensor(x_train)

Getting the error Message:

TypeError: Expected binary or unicode string

Which Format is required to Transform my Matrix into a Tensor?

Edit: printing out the type of the column it says:

<class 'scipy.sparse.csr.csr_matrix'>

Example of the dataframe as output:

0          (0, 6276)\t1\n  (0, 8209)\t1\n  (0, 14299)\t...
1          (0, 6276)\t1\n  (0, 8209)\t1\n  (0, 14299)\t...
2          (0, 6276)\t1\n  (0, 8209)\t1\n  (0, 14299)\t...
3          (0, 6276)\t1\n  (0, 8209)\t1\n  (0, 14299)\t...
Benny Müller
  • 185
  • 1
  • 14

1 Answers1

1

Here is an example of converting a sparse scipy matrix to tensorflow dense format.

Input sparse scipy matrix

A = np.array([[1,2,0],[0,0,3],[4,0,0]])
sA = sparse.csr_matrix(A)

print (sA)
# (0, 0)    1
# (0, 1)    2
# (1, 2)    3
# (2, 0)    4

idx, idy, val = sparse.find(sA)

print(idx, idy, val)
#[0 2 0 1] [0 0 1 2] [1 4 2 3]

To tensorflow

#merge idx and idy array to convert to [idx, idy] matrix    
full_indices = tf.stack([idx, idy], axis=1)

#Output matrix size
depth_x = 3
depth_y = 3

# sparse to dense matrix 
dense = tf.sparse_to_dense(full_indices,tf.constant([depth_x,depth_y]), val, validate_indices=False)

with tf.Session() as sess:
   print(sess.run(dense))
#[[1 2 0]
# [0 0 3]
# [4 0 0]]
Vijay Mariappan
  • 16,921
  • 3
  • 40
  • 59
  • using sparse.find() giving me the error message: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(). – Benny Müller May 31 '18 at 13:19
  • 1
    can you try this method : https://stackoverflow.com/questions/10360210/access-value-column-index-and-row-ptr-data-from-scipy-csr-sparse-matrix – Vijay Mariappan May 31 '18 at 13:39
  • well both yours and the other method are working before the matrix is added to the pandas dataframe, seems like I have to rewrite some code – Benny Müller May 31 '18 at 14:01