2

I have built a model and I am successfully able to prune it using tf.contrib's model pruning module with default params and sparsity as 90%, but the problem is when I run the model it still takes the same amount of execution time as of the original model, my guess is that instead of running only the pruned version, tensorflow is running the entire graph with masked weghts and thats why there is no improvement even after pruning.

So how to export the pruned model with subgraph and respective weights and use it?

Avinash Rai
  • 171
  • 6

1 Answers1

1

The strip_pruning_vars utility might be what you're looking for.

From the read.me file: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/model_pruning#adding-pruning-ops

Removing pruning ops from the trained graph

Once the model is trained, it is necessary to remove the auxiliary variables (mask, threshold) and pruning ops added to the graph in the steps above. This can be accomplished using the strip_pruning_vars utility.

Would you mind sharing your code?

Eric Page
  • 131
  • 10
  • Thank you for the response, I will try this method as soon as possible – Avinash Rai Aug 27 '18 at 18:19
  • @AvinashRai 1) Did the strip_pruning_vars have any impact on speed up? 2) Did you try the CIFAR10 example provided by Tensorflow? Did you see any speedup in the inference. – Anil Maddala Aug 28 '18 at 20:11
  • @AnilMaddala This does not improves speed, the reason is, as per the docs, the strip_pruning_vars does not recreates the model graph with smaller or pruned wieghts, instead what it does is, it applies a binary mask over the weights, it does not change the shape of weights. What I am looking for is removing the unnecessary weights which will improve the speed. – Avinash Rai Aug 29 '18 at 01:40
  • Roman Nikishin contacted Michael H. Zhu, one of the creators of this library and received this response: "The model weights are stored as a dense weights tensor and a dense mask tensor (0 or 1 tensor). To save the pruned model, you would save both weights and mask, and you can reconstruct the sparse tensor as masked_weights = mask .* weights. To get a speedup, you have to implement your own inference code taking advantage of the sparsity which TensorFlow does not currently do." Post here: https://stackoverflow.com/questions/52064450/how-to-use-tf-contrib-model-pruning-on-mnist – Eric Page Sep 26 '18 at 12:56
  • In short, it sounds like this isn't possible to do with any existing tensorflow library. – Eric Page Sep 26 '18 at 12:58