Relation between Cnn Modell and Single precision fp32/int8

Question

I'm currently writing my BA Thesis about Hardware and Frameworks for AI-Inference. In my research, I looked up TensorRT and find ab table which I don't really understand.

Table

Sadly there is no real explanation for this except the title. I understand that there are different CNN models with different number of layers and that till to a certain point adding up Layers it will result in an increase of accuracy but to much layers also can result in errors.

But I don't understand how it is related to fp32 and int 8 and what this table is trying to tell me. It would be nice if someone could help me out here. Also I don't really know what they mean with "retraining".

Thanks for any answer

H Vanholder - GPU Technology Conference, 2016 If you type this in google or google scholar you should find the PDF im referring to. The slide is number 17 in the PDF — Morty687, Jun 19 '22 at 17:01
Re “If you type this in google or google scholar”: We have a system of Uniform Resource Locators (URLs) that provide direct links to documents. Do not expect readers to search for things. When you cite a document, give its URL and bibliographic information for it. — Eric Postpischil, Jul 07 '22 at 23:40

score 0 · Answer 1 · answered Jul 07 '22 at 15:13

Have you understood what TensorRT does to the models? Please read this page if you haven't already.

Decreasing the weight and activation accuracy is one of the many optimizations TensorRT does to the models. Traditionally neural network models use 32-bit floating point numbers for arithmetics. Recently, researchers have found out that we can use less precision (16-bit floating point or 8-bit integers) for the operations instead. This does not decrease the accuracy of the neural network noticeably in many cases but it does increase the performance of the model considerably.

Also I don't really know what they mean with "retraining".

Neural network training takes a lot of time and money. We are talking about weeks of training and millions of dollars in costs. Instead of decreasing the accuracy of the weights of the model before training TensorRT optimizes them after the training. This takes magnitudes less time and is more convenient.

Relation between Cnn Modell and Single precision fp32/int8

1 Answers1