Keras Pruning: Setting Weights to Zero Doesn't Accelerate Inference?

Question

I'm writing a pruning algorithm for tf.keras that simply removes the lowest x percentile of weights from a layer / filter. To do this, I've tried setting the value of the weights to prune to zero. Having read other sources, I'm under the impression that this has the same effect as "removing" a weight from a network, but even if I set all the weights in a network to be zero, no decrease in inference time is noted.

If I were to hypothetically set all the weights in a layer to zero, the code would be as follows:

    flat_weights = np.array(self.model.layers[layer_index].get_weights()[0]).flatten()

    weight_index = 0 
    for weight in flat_weights:
        #if weight < self.delta_percentiles[index]:
        flat_weights[weight_index] = 0
        weight_index += 1

    weights[0] = np.reshape(flat_weights, original_shape)
    weights[1] = np.zeros(np.shape(weights[1]))

    self.model.layers[index].set_weights(weights)

Theoretically, the inference time of a model pruned in such a way should decrease but no change is found. Am I pruning correctly?

I don't think setting weights to zero will have the same effect as removing them. To start, there's obviously gonna be a memory consumption difference. Can you share where you read this? — thushv89, Jan 29 '20 at 22:13
why would the inference time change? multiplying by zero takes as long as multiplying by anything else... — Dave Kielpinski, Jan 29 '20 at 22:24
Yeah I appreciate it’s a little illogical, but one of the sources is TensorFlow: https://medium.com/tensorflow/tensorflow-model-optimization-toolkit-pruning-api-42cac9157a6a. “ Weight pruning means eliminating unnecessary values in the weight tensors. We are practically setting the neural network parameters’ values to zero to remove what we estimate are unnecessary connections between the layers of a neural network”. I’m sure I’ve found a few other places that say this too, I’ll find them if needs be — Jack98, Jan 29 '20 at 23:15

score 1 · Answer 1 · answered Jan 30 '20 at 05:20

Setting a weight to zero is kind of the same as removing a weight, as then the network would be functionally equivalent if you had the same architecture, but with the same weights and one less neuron in that layer. The predictions you would get are the same.

But it does not have an effect on computational performance, as you noticed. For computation time to change, you would have to define a new network with one less weight, and then load the weights from the other architecture. You are now imagining that doing this is not easy, and it is the reason why we do not do it generally for evaluation, as we want to find out how predictive performance (like accuracy or mean squared error) changes as you prune weights.

So in order to get the computational advantages of pruning, you have to do a lot more than just setting weights to zero.

Thanks for your reply - I think you saved me from going down a rabbit hole considering implementing a sparse Conv2D layer. To clarify - if I want to remove x weights from a layer, I'd have to create a new model with that layer having x less weights in it. From here, I'd have to remove those weights from the old model's weight matrix, and copy it to the new one? — Jack98, Jan 30 '20 at 11:00

Keras Pruning: Setting Weights to Zero Doesn't Accelerate Inference?

1 Answers1

Linked