I compiled tensorflow 1.3 from source files and was unpleasantly surprised by the performance of the product. Considered the comments of the community have managed to reduce the superiority of numpy over tensorflow from 45% to 35% when calculating on the CPU. But still, the difference is huge. Benchmarks code given below:
#! /usr/bin/env python3
import sys
import time
import numpy as np
import tensorflow as tf
print('Python', sys.version)
print('TensorFlow', tf.__version__)
gDType = np.float64
size = 8192
# Numpy calculation
rand_array = np.random.uniform(0, 1, (size, size))
timer0 = time.time()
res = np.dot(np.dot(rand_array, rand_array), rand_array)
print("numpy multiply: %f" % (time.time() - timer0))
# TensorFlow calculation
x = tf.Variable( tf.random_uniform(shape=(size, size), minval=0, maxval=1, dtype=gDType), dtype=gDType, name='x')
x3 = tf.matmul(tf.matmul(x, x), x)
# Avoid optimizing away redundant nodes
config = tf.ConfigProto(graph_options=tf.GraphOptions(optimizer_options=tf.OptimizerOptions(opt_level=tf.OptimizerOptions.L0)))
sess = tf.Session(config=config)
# sess = tf.Session()
sess.run(tf.global_variables_initializer())
# Exclude delays caused by initialization of the graph
timer0 = time.time()
sess.run(x3.op)
print("tensorflow multiply 1 pass: %f" % (time.time() - timer0))
timer0 = time.time()
sess.run(x3.op)
print("tensorflow multiply 2 pass: %f" % (time.time() - timer0))
Here is the output of the script:
$ ./matmul_benchmark.py
Python 3.5.2 (default, Nov 17 2016, 17:05:23)
[GCC 5.4.0 20160609]
TensorFlow 1.3.0
numpy multiply: 37.464786
tensorflow multiply 1 pass: 61.245776
tensorflow multiply 2 pass: 49.944690
The script in the process, consumes 4 GB of RAM and you might want to reduce the size variable to 4096.
The comparison shows the superiority of numpy by 35% (50 sec. / 37 sec.).
Tell me, please, was there any mistake in this test?
PS. My CPU Sandy-bridge flags:
$ lscpu | grep Flags
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx
rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable
nonstop_tsc aperfmperf pni pclmulqdq ssse3 cx16 **sse4_1 sse4_2** popcnt aes
xsave **avx** hypervisor lahf_lm epb xsaveopt dtherm ida arat pln pts