0

I have a normal feed-forward network that produces a vector v. The elements of v are then used as the non-zero entries of a sparse matrix M (assume the coordinates are predefined). The sparse matrix is then multiplied by a dense vector and a loss is defined on the resulting scalar. I want to back-propagate the loss w.r.t. the weights of the network, which entails going through the sparse matrix.

This seems like a perfectly reasonable use-case for a sparse matrix, but it appears that such functionality is not supported. Indeed, even calling tf.gradients(M,[v]) produces an error:

AttributeError: 'SparseTensor' object has no attribute 'value_index'

Am I doing something wrong or am I correct in presuming that this functionality doesn't (yet?) exist? If the latter, then is there a work-around for this particular use-case short of rewriting all of the sparse tensor operations with gradients defined?

zergylord
  • 4,368
  • 5
  • 38
  • 60

2 Answers2

1

A slight variation on this does work, taking the gradient of the values of a SparseTensor directly:

import tensorflow as tf
sparse_values = tf.identity(tf.Variable(tf.constant([1., 2., 3.])))
sparse_indices = tf.constant([[0, 0], [1, 1], [2, 2]], dtype=tf.int64)
sparse_matrix = tf.SparseTensor(sparse_indices, sparse_values, [3, 3])
multiplied = tf.sparse_tensor_dense_matmul(sparse_matrix, tf.eye(3))
loss = tf.reduce_sum(multiplied)
gradients = tf.gradients(loss, [sparse_values])
with tf.Session() as session:
    tf.global_variables_initializer().run()
    print(session.run(gradients))

Prints (on TensorFlow 0.12.1):

[array([ 1.,  1.,  1.], dtype=float32)]

Why the tf.identity op is necessary for the gradient to be defined I haven't quite figured out (probably something to do with ref dtypes).

Allen Lavoie
  • 5,778
  • 1
  • 17
  • 26
0

I'm fishing around in the dark here, working from code and documentation, not experience.

The Tensor class creator is:

def __init__(self, op, value_index, dtype):
    #  value_index: An `int`. Index of the operation's endpoint that produces this tensor.

The value_index is used to generate the Tensor name.

The SparseTensor one is

def __init__(self, indices, values, dense_shape):

Nowhere in it's definition file tensorflow/tensorflow/python/framework/sparse_tensor.py is value_index referenced.

Its arguments are Tensors, presumably each with its own value_index.

Else where it appears that a SparseTensor is an alternative to a IndexedSlices, which also contains tensors.

The inputs to tf.gradients are all

A `Tensor` or list of tensors 

The gradients definition file has a _IndexedSlicesToTensor method, but nothing equivalent for SparseTensor. So there seems to be some sort of automatic conversion to dense in the case of IndexedSlices (with a warning if the result is too big), but not for SparseTensors. I don't know if that'a case of incomplete development, or an incompatibility that makes it impossible.

hpaulj
  • 221,503
  • 14
  • 230
  • 353