Compute cumsum in pycaffe

Question

I'm currently trying to implement an equivalent of pytorch's cumsum function in Caffe2, ideally using the standard operators (not messing with the C++ code), and having some trouble. My best attempt so far is

pred_cdist = net.Transpose(
            net.SparseLengthsSum(
                [
                    net.Transpose(prediction),
                    as_blob(
                        np.array(
                            [
                                ix
                                for length in range(pred_shape[1])
                                for ix in range(length)
                            ],
                            dtype="int32",
                        )
                    ),
                    as_blob(np.array(list(range(pred_shape[1])), dtype="int32")),
                ]
            )
        )

This approach works for the forward pass, but when computing gradients I get the error: Gradient of output .../Transpose is sparse (expected dense)..

Does anyone know another way to compute this operation, or have any idea how to tweak this solution so that the gradient works properly?

VonC · Answer 1 · 2023-07-15T17:58:10.307

Your code looks like it is using a sparse operation (SparseLengthsSum), which may lead to issues with the gradient calculation, as some of the gradients will be undefined.
(Context: your previous question "How to ensure that a tensor is in dense representation in caffe2" I previously answered)

That sparse operation essentially groups slices of an input tensor along the first dimension, using another tensor that specifies the lengths of these slices. Then, it reduces each group of slices by summing them.

The issue arises when you try to backpropagate through this operation. Because the SparseLengthsSum operation involves an aggregation (summing), it does not have a unique gradient: there are many possible sets of gradients that could give the same forward result. In practice, this means that the backward pass needs to "choose" a set of gradients, and the choice made may not correspond to the way you would like your model to learn.

With "Gradient of output .../Transpose is sparse (expected dense)", it appears the backward pass is expecting a dense gradient (a gradient for every element of the input tensor), but the SparseLengthsSum operation is only providing a sparse gradient (a gradient for only some elements of the input tensor). That discrepancy between expected and provided gradients is likely causing the error.

See also "The effect of gradient sampling schemes on measures derived from diffusion tensor MRI: A Monte Carlo study" by Derek K. Jones, for more on gradient.

A workaround would be to use the ReduceSum operator in a loop, iterating over your tensor's elements and accumulating the sum, with ScatterAssign.

If you need a more efficient and flexible solution, and you are not strictly tied to Caffe2, I would recommend switching to a more modern deep learning framework such as PyTorch, TensorFlow, or JAX, all of which support a cumsum operation natively and have robust support for autograd.

That workaround can be helpful because it allows you to replicate the cumulative sum behavior, even though Caffe2 does not provide an out-of-the-box operator for that purpose.

In essence, a cumulative sum of a sequence is just the sum of all elements up to the current index. So, by iterating over your tensor's elements and accumulating the sum, you are essentially performing the cumulative sum operation.

Specifically, the ReduceSum operator returns the sum of all the elements of a tensor. You would typically use this operation on a tensor slice to get the sum of all elements up to the current index, then add this sum to an accumulated tensor.

Again, as mentioned before, this workaround can be quite inefficient, especially for large tensors. It also does not solve the problem of calculating gradients easily, as it involves manually manipulating tensor elements within a loop.

Honest question: are you using a LLM to generate these answers? Your code gestures at a solution that might work, but it's full of strange inconsistencies (the output tensor isn't the right size, the arguments to Slice don't make any sense based on the docs). In any case it looks like I might be able to get to a solution using `Slice` and `ScatterAssign`, so I'll award the bounty. — Isaac, Jul 15 '23 at 15:54
@Isaac Yes for the code. I started researching and sourcing the answer myself (still learning in this field), and I will remove the code, pending your own answer. — VonC, Jul 15 '23 at 17:59
@Isaac Did you find an answer? If yes, you can post it there, and chose your own answer. — VonC, Jul 22 '23 at 15:31

Compute cumsum in pycaffe

1 Answers1