How can I use tensorflow to do convolution using fp16 on GPU? (the python api using __half or Eigen::half).
I want to test a model with fp16 on tensorflow, but I got stucked. Actually, I found that fp16 convolution in tensorflow seems like casting the fp32 convolution's result into fp16, which is not what I need.
I tried to give the tf.nn.conv2d a fp16 input in fp16 format, and give the tf.nn.conv2d a fp16 input in fp32 format (tf.cast it into fp32) then tf.cast the result into fp16, and they gave exactly the same result. But as I think, doing convolution in fp16 is different from doing it in fp32 and then cast it into fp16, am I wrong? Please help me, thanks.
environment:
ubuntu 16.04
tensorflow 1.9.0
cuda 9.0
Tesla V100
import tensorflow as tf
import numpy as np
import os
def conv16_32(input, kernel): # fake fp16 convolution
input = tf.cast(input, tf.float16)
kernel = tf.cast(kernel, tf.float16)
input = tf.cast(input, tf.float32)
kernel = tf.cast(kernel, tf.float32)
out = tf.nn.conv2d(input, kernel, [1,1,1,1], padding='VALID')
out = tf.cast(out, tf.float16)
out = tf.cast(out, tf.float64)
return out
def conv16(input, kernel): # real fp16 convolution
input = tf.cast(input, tf.float16)
kernel = tf.cast(kernel, tf.float16)
out = tf.nn.conv2d(input, kernel, [1,1,1,1], padding='VALID')
out = tf.cast(out, tf.float64)
return out
x = np.random.rand(16, 32, 32, 16).astype('float64')
w = np.random.rand(3, 3, 16, 16).astype('float64')
x = tf.get_variable('input', dtype=tf.float64, initializer=x)
w = tf.get_variable('weight', dtype=tf.float64, initializer=w)
out_16 = conv16(x, w)
out_16_32 = conv16_32(x, w)
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config = config)
sess.run(tf.global_variables_initializer())
sess.run(tf.local_variables_initializer())
print(sess.run(tf.reduce_max(out_16_32 - out_16)))
The above two functions give the same result, say the final 'print' result is zero.
The result of fp16 convolution and fp32 convolution should not be same (in my point of view). How can I use tensorflow to do convolution using real fp16 on GPU? (the python api using __half or Eigen::half)