0

I want to inference with a fp32 model using fp16 to verify the half precision results. After loading checkpoint, the params can be converted to float16, then how to use these fp16 params in session?

reader = tf.train.NewCheckpointReader(model_file)
var_to_map = reader.get_variable_to_dtype_map()

for key, val in var_to_map.items():
    tsr = reader.get_tensor(key)
    val_f16 = tf.cast(tsr, tf.float16)

# sess.restore() ???
7oud
  • 75
  • 1
  • 8

1 Answers1

0

I found a method to realize it.

  1. load checkpoint with tf.train.NewCheckpointReader(), then read params and convert them to float16 type.
  2. use float16 read params to initialize layers
    weight_name = scope_name + '/' + get_layer_str() + '/' + 'weight'
    initw = inits[weight_name]
    weight = tf.get_variable('weight', dtype=initw.dtype, initializer=initw)
    out = tf.nn.conv2d(self.get_output(), weight, strides=[1, stride, stride, 1], padding='SAME')
  1. run the graph

My GPU was GTX1080 without tensor core, but inference with fp16 is faster than with fp32 by 20%-30%, I don't understand the reason, and which "hardware units" was used to calc fp16, is traditional units for fp32?

7oud
  • 75
  • 1
  • 8