1

I'm writing a loss-function in python Caffe2 that will receive a tensor, and (as the first step) compute the transpose of the input tensor:

pred_t = net.Transpose(prediction)

Because of the particular setup I am working with, I do not have full visibility into the code that passes the input tensor to my code, and so I'm not sure exactly how this tensor is produced, but I get the following error:

Exception when creating gradient for [Transpose]:[enforce fail at operator_gradient.h:150] g_output_.at(i).IsDense().
Gradient of output <redacted>/Transpose is sparse (expected dense)..
Op: input: "<redacted>/0:ensemble/MultiClassManualWeightCalibration/softmax/prob"
output: "<redacted>/Transpose"
name: "" type: "Transpose" device_option { }

I have tried wrapping my input in SparseToDense operators, but this does not seem to have any impact. I cannot find any real documentation about sparse vs. dense tensors, and although I can see how the different representations are defined in the underlying caffe2 code, I don't see any obvious way to translate between the formats.

Assuming that I am being passed a sparse tensor, how can I convert it to a dense tensor, or otherwise work with it effectively?

Isaac
  • 3,586
  • 1
  • 18
  • 20

1 Answers1

0

Assuming that I am being passed a sparse tensor, how can I convert it to a dense tensor, or otherwise work with it effectively?

Gradient of output <redacted>/Transpose is sparse (expected dense)

That means the gradient for your Transpose operation is being computed as a sparse tensor, which is unexpected because Transpose should produce a dense tensor. The issue might not be with the input tensor itself, but with the operation that is being applied to it.

If you are trying to use the SparseToDense operator to convert the input tensor, it might not be working as expected because the input tensor is not sparse to begin with. It is also possible that the output of SparseToDense is not being used correctly, which is why you are not seeing any changes.

As for converting between sparse and dense tensors, in general, a sparse tensor can be converted to a dense tensor by filling in all the "missing" values with zeros. In Caffe2, you should be able to use the SparseToDenseMask operator for this purpose, which takes a sparse tensor and a mask as input and returns a dense tensor. However, it is not clear from your question whether this is applicable to your specific case.

Plus, Caffe2 mainly provides support for representing sparse features and performing corresponding operations on segments of tensors. The documentation provides several examples of how sparse features might be represented:

  1. Values and lengths: This representation uses two tensors - one holding concatenated feature values and another having the number of feature values for each example. For matrices, it roughly corresponds to the Compressed Sparse Row (CSR) format but uses lengths instead of offsets.

  2. Segment IDs: This representation also concatenates values together, but has a second vector of the same length as the first dimension of the main tensor. Each element of the segment_ids maps the corresponding slice of the main tensor to one of the examples (called segments in this case).

  3. Padded representation: This representation stacks examples along the first dimension (e.g. rows in a matrix) and uses a filler value to make them of equal length.

  4. Sparse tensor: This comes from interpreting values as indices in some big sparse matrix. This is usually a very inefficient representation for practical purposes, but often is a semantic meaning of how features are used.

Regarding dense tensors, the general idea is that sparse tensors are used when your data is mostly zeros, saving memory and potentially computation. They are used when most of your data is non-zero. Each has its own advantages and use cases, depending on the specific requirements of your model and data.

You can see an example in "DeepReduce: A Sparse-tensor Communication Framework for Federated Deep Learning", authors Xu, Hang and Kostopoulou, Kelly and Dutta, Aritra and Li, Xin and Ntoulas, Alexandros and Kalnis, Panos, paper (pdf), 2021:

Dense to Sparse to Optimized tensor

  • Figure 1.a depicts an example tensor containing 8 real values, 4 of which are zero; its dense representation would require 256 bits.
  • Typically, sparse tensors are represented as a set of (key, value) pairs (see Figure 1.b), where key is the index; notice, however, that the (key, value) representation also requires 256 bits, negating the benefit of sparsity.
  • We show an example of our improved approach in Figure 1.c. We consider the indices as an ordered list represented by a Boolean array of 8 bits, such that the ith bit is ‘1’ if and only if the corresponding gradient element is non-zero. Moreover, we fit a function, V (i) = a · i + b to the gradient values, with parameters, a = 0.6 and b = 4.0. By transmitting only the bit string and parameters (a, b), we can reconstruct the original tensor while requiring only 72 bits.

In your case, make sure you are using the output of the SparseToDense operator correctly. If you are not seeing any changes after applying SparseToDense, it might be because you are not actually using the output tensor that it produces. Make sure you are assigning the output of SparseToDense to a variable and then using that variable in the rest of your code.

If you want to use the prediction tensor, convert it to a dense tensor and then transpose it, you can use the SparseToDense operator followed by the Transpose operator.

Firstly, you should extract the indices and values from the prediction tensor. The exact way to do this depends on how your sparse tensor is represented. For instance, if prediction is a dictionary with indices as keys and values as values, you could do something like this:

indices = list(prediction.keys())
values = list(prediction.values())

Then, you can feed these arrays to blobs in your workspace, convert the sparse tensor to a dense one using the SparseToDense operator, and then transpose the resulting tensor using the Transpose operator:

workspace.FeedBlob("indices", np.array(indices))
workspace.FeedBlob("values", np.array(values))

net = core.Net("my_net")

# Convert the sparse tensor to a dense one
dense_prediction = net.SparseToDense(["indices", "values"], ["dense_prediction"])

# Transpose the dense tensor
pred_t = net.Transpose([dense_prediction], ["pred_t"])

Finally, you can run the operators:

workspace.CreateNet(net)
workspace.RunNet(net.Proto().name)

Remember to modify this example according to your specific use case. For instance, the dimensions and data types of your tensors might be different, and the format of your sparse tensor might also be different.


Can you expand on the ways in which misapplying an operator to the Transpose could cause this error?

I am currently passing it to a SparseLengthsSum, which does have a suggestive name, but whose documentation makes no actual mention of interpreting its input as sparse.

From your description, the error you are encountering occurs when creating the gradient for the Transpose operation. The error message suggests that the gradient of the output tensor from the Transpose operation is sparse, but the operation expected it to be dense.

The SparseLengthsSum operator in Caffe2 sums slices of the input tensor according to lengths. If your tensor is sparse and you are passing it to this operator, you are effectively treating it as a dense tensor and summing over certain lengths of it, which might be causing the issue you are seeing.

In other words, the input tensor to the SparseLengthsSum operator is expected to be dense, because it needs to be able to index into the tensor and sum over certain lengths of it. If the input tensor is sparse, then these indexing and summing operations might not be well-defined, because the tensor does not have values at every index.

The SparseLengthsSum operator does not explicitly mention interpreting its input as sparse because it is not designed to handle sparse inputs. It expects a dense tensor as input and performs operations based on that assumption.

When you then try to compute the gradients during backpropagation, the Transpose operator encounters a sparse tensor where it expects a dense one, leading to the error you are seeing.

In this case, it would be advisable to convert your sparse tensor to a dense tensor before passing it to the SparseLengthsSum operator. This will ensure that the operator has a dense tensor to work with, which should prevent the error from occurring.

You could use the SparseToDense operator as discussed earlier to convert your sparse tensor to a dense tensor before passing it to the SparseLengthsSum operator.


It is not easy to directly check if a tensor in Caffe2 is sparse or dense because the format is implicitly defined by how the tensor is used in operations. However, if SparseToDense does not resolve the issue, it might not be solely a problem of tensor format.

Here are some things you could try:

  1. Inspect the Tensor Data: You could print out the tensor data before the Transpose operation and examine its values. If it is indeed a sparse tensor, you should see a lot of zeroes.

  2. Verify the Dimensions: Make sure that the dimensions of the tensor are as expected before the Transpose operation. Sometimes issues can arise if a tensor does not have the expected dimensions.

  3. Check for NaN or Inf values: Sometimes, computations can result in NaN or Inf values in the tensor, which can cause issues. You might want to check if your tensor contains any such values.

  4. Try a Different Operator: Instead of SparseToDense, you could try using a different operator to convert the tensor to dense format. For example, the LengthsToValues operator can convert a sparse tensor to a dense one.

  5. Check the Computation Graph: Make sure that the tensor is not being used in any other operations that might be affecting its format. For example, if there is another operation that's expecting a sparse tensor and you are passing in a dense tensor, that could cause issues.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
  • Thanks for the long response to what I know is a vague question. Can you expand on the ways in which mis-applying an operator to the `Transpose` could cause this error? I'm currently passing it to a `SparseLengthsSum`, which does have a suggestive name, but whose documentation makes no actual mention of interpreting its input as sparse. – Isaac Jun 05 '23 at 20:15
  • @Isaac Sure. I have edited the answer to address your comment. – VonC Jun 05 '23 at 20:24
  • From your last update it sounds like our best theory is still that the input tensor is sparse? As mentioned I already tried wrapping the input tensor in `SparseToDense` (so that the whole operation looks like `SparseLengthsSum( [Transpose( SparseToDense( [..., input] ) ), ...] )`), but still got the same error. Any way to check if the input tensor is really sparse? Anything else worth trying? – Isaac Jun 05 '23 at 21:45
  • @Isaac There are some checks you can do, yes. I have edited the answer to list them. – VonC Jun 05 '23 at 22:10