0

Requirement

Given tensor like:

SparseTensorValue(indices=array([[0, 0], [1, 0], [1, 1], [1, 2]]),
                  values=array([2, 0, 2, 5]),
                  dense_shape=array([2, 3]))

the shape is 2x3

| 2 na na |
| 0  2  5 |

Need a new tensor with value in the index, like below:

Note that the total num of values is 6 (set of [0, 1, 2, 3, 4, 5]) the shape is 2x6

| 0 0 1 0 0 0 |
| 1 0 1 0 0 1 |

The tensor can be created by the code below:

SparseTensorValue(indices=array([[0, 2], [1, 0], [1, 2], [1, 5]]),
                  values=array([1, 1, 1, 1]),
                  dense_shape=array([2, 6]))

How to do it in TensorFlow way? Neither approach below is working

import tensorflow as tf

tags = tf.SparseTensor(indices=[[0, 0], [1, 0], [1, 1], [1, 2]],
                       values=[2, 0, 2, 5],
                       dense_shape=[2, 3])

print(type(tags.indices))

# approach 1:  the TensorFlow way to implement the python logic
new_indices = [[tags.indices[i], tags.values[i]]
               for i in range(tags.values.shape[0])]  # syntax incorrect

# approach 2:
indice_idx = tf.map_fn(lambda x : x[0], tags.indices)
value_idx = tf.map_fn(lambda x : x[1], tags.indices)
value_arr = tf.gather(tags.values, value_idx)

with tf.Session() as s1:
    print(indice_idx.eval())
    print(tags.values.eval())
    print('value_arr', value_arr.eval())


"""
[0 0 1 2]   <-- value_idx, which is the index of tags.values

want to combine
[0 1 1 1]   <-- indice_idx
[2 2 0 2]   <-- value_arr, which is the value of tags.values
==>
[[0,2], [1,2], [1,0], [1,2]]
"""
new_indices = tf.concat(indice_idx, value_arr)  # syntax incorrect

with tf.Session() as s:
    s.run([tf.global_variables_initializer(), tf.tables_initializer()])
    print(s.run(value_arr))
    print(s.run(tags.values))
    print(s.run(new_indices))
    print(s.run(tags.indices[3, 1]))
fengda
  • 39
  • 3
  • What if a value is repeated in a given row? Also, sparse matrices assume the unfilled elements are zero typically, so for example, the value '0' would be repeated a huge number of times for a general row of a sparse matrix. – ely Apr 11 '18 at 18:17
  • it's guaranteed to be not repeated, and the input data itself is a sparse matrix. – fengda Apr 11 '18 at 18:29
  • Your question is still unclear. Are looking for how to construct the middle tensor? Or are you looking for a `k x 2` tensor, with a separate row for each (row index, value) pair in the original? – ely Apr 11 '18 at 18:49
  • It's to construct a new Tensor as described above. I'm looking for an appropriate way to for-loop the Tensor `indices` and `values`, – fengda Apr 11 '18 at 18:56
  • Inside the computation graph, or separately as a `SparseTensorValue` outside of the computation graph? For outside, you can just iterate using the `.indices` and `.values` attributes, yes? – ely Apr 11 '18 at 18:58
  • @ely that's what I was asking for. I've updated the post by including the sample code. – fengda Apr 11 '18 at 20:09

1 Answers1

0

Answer

in approach 2: new_indices = tf.stack([indice_idx, value_arr], axis=1)

The full version of code is

import tensorflow as tf

tags = tf.SparseTensor(indices=[[0, 0], [1, 0], [1, 1], [1, 2]],
                       values=[2, 0, 2, 5],
                       dense_shape=[2, 3])

print(type(tags.indices))

# # approach 1:  any TensorFlow way to implement the Python logic below?
# new_indices = [[tags.indices[i], tags.values[i]]
#                for i in range(tags.values.shape[0])]  # syntax incorrect

# approach 2:
indice_idx = tf.map_fn(lambda x : x[0], tags.indices)
value_idx = tf.map_fn(lambda x : x[1], tags.indices)
value_arr = tf.cast(tf.gather(tags.values, value_idx), tf.int64)

with tf.Session() as s1:
    print(indice_idx.eval())
    print(tags.values.eval())
    print('value_arr', value_arr.eval())


"""
[0 0 1 2]   <-- value_idx, which is the index of tags.values

tf.stack does:
[0 1 1 1]   <-- indice_idx
[2 2 0 2]   <-- value_arr, which is the value of tags.values
==>
[[0,2], [1,2], [1,0], [1,2]]
"""
new_indices = tf.stack([indice_idx, value_arr], axis=1)

with tf.Session() as s:
    s.run([tf.global_variables_initializer(), tf.tables_initializer()])
    print(s.run(value_arr))
    print(s.run(tags.values))
    print(s.run(new_indices))
    print(s.run(tags.indices[3, 1]))

This problem itself is solved.

a separated related problem

P.S. it's not working if reading the file, see:

create multi-hot SparseTensor by categorical feature array column from CSV in TensorFlow

fengda
  • 39
  • 3