How to inference using fp16 with a fp32 trained model?

Question

I want to inference with a fp32 model using fp16 to verify the half precision results. After loading checkpoint, the params can be converted to float16, then how to use these fp16 params in session?

reader = tf.train.NewCheckpointReader(model_file)
var_to_map = reader.get_variable_to_dtype_map()

for key, val in var_to_map.items():
    tsr = reader.get_tensor(key)
    val_f16 = tf.cast(tsr, tf.float16)

# sess.restore() ???

score 0 · Accepted Answer · answered Jan 31 '19 at 09:15

I found a method to realize it.

load checkpoint with tf.train.NewCheckpointReader(), then read params and convert them to float16 type.
use float16 read params to initialize layers

    weight_name = scope_name + '/' + get_layer_str() + '/' + 'weight'
    initw = inits[weight_name]
    weight = tf.get_variable('weight', dtype=initw.dtype, initializer=initw)
    out = tf.nn.conv2d(self.get_output(), weight, strides=[1, stride, stride, 1], padding='SAME')

run the graph

My GPU was GTX1080 without tensor core, but inference with fp16 is faster than with fp32 by 20%-30%, I don't understand the reason, and which "hardware units" was used to calc fp16, is traditional units for fp32?

How to inference using fp16 with a fp32 trained model?

1 Answers1

Linked