2

(Very long error messages in the question below. TL;DR, here is the specific question: Why does this test code not execute on the TX1's GPU, and what do I need to do to make it do so?)

I have just flashed and installed a brand new Nvidia Jetson TX1, with JetPack 2.3. I am trying to get Theano installed on the TX1 in such a way as to enable the use of the on-board GPU, for further machine learning and neural network applications.

However, I cannot seem to get the GPU itself to work.

The install of Theano was taken from here:

sudo apt-get install python-numpy python-scipy python-dev python-pip python-nose g++ libblas-dev git
pip install --upgrade --no-deps git+git://github.com/Theano/Theano.git --user  # Need Theano 0.8(not yet released) or more recent

Theano version installed was 0.9.0.dev2, python is version 2.7.12.

I used the test script from here :

from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

When running as recommended:

THEANO_FLAGS=device=cuda0 python gpu_tutorial1.py

I get the following response, full of errors, warnings, and an execution on the CPU rather than the GPU:

ERROR (theano.gpuarray): pygpu was configured but could not be imported
Traceback (most recent call last):
  File "/home/ubuntu/.local/lib/python2.7/site-packages/theano/gpuarray/__init__.py", line 21, in <module>
    import pygpu
ImportError: No module named pygpu
WARNING (theano.gof.cmodule): OPTIMIZATION WARNING: Theano was not able to find the default g++ parameters. This is needed to tune the compilation to your specific CPU. This can slow down the execution of Theano functions. Please submit the following lines to Theano's mailing list so that we can fix this problem:
 ['# 1 "<stdin>"\n', '# 1 "<built-in>"\n', '# 1 "<command-line>"\n', '# 1 "/usr/include/stdc-predef.h" 1 3 4\n', '# 1 "<command-line>" 2\n', '# 1 "<stdin>"\n', 'Using built-in specs.\n', 'COLLECT_GCC=/usr/bin/g++\n', 'Target: aarch64-linux-gnu\n', "Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-arm64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-arm64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-arm64 --with-arch-directory=aarch64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu\n", 'Thread model: posix\n', 'gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.2) \n', "COLLECT_GCC_OPTIONS='-E' '-v' '-shared-libgcc' '-mlittle-endian' '-mabi=lp64'\n", ' /usr/lib/gcc/aarch64-linux-gnu/5/cc1 -E -quiet -v -imultiarch aarch64-linux-gnu - -mlittle-endian -mabi=lp64 -fstack-protector-strong -Wformat -Wformat-security\n', 'ignoring nonexistent directory "/usr/local/include/aarch64-linux-gnu"\n', 'ignoring nonexistent directory "/usr/lib/gcc/aarch64-linux-gnu/5/../../../../aarch64-linux-gnu/include"\n', '#include "..." search starts here:\n', '#include <...> search starts here:\n', ' /usr/lib/gcc/aarch64-linux-gnu/5/include\n', ' /usr/local/include\n', ' /usr/lib/gcc/aarch64-linux-gnu/5/include-fixed\n', ' /usr/include/aarch64-linux-gnu\n', ' /usr/include\n', 'End of search list.\n', 'COMPILER_PATH=/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/:/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/\n', 'LIBRARY_PATH=/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/5/../../../aarch64-linux-gnu/:/usr/lib/gcc/aarch64-linux-gnu/5/../../../../lib/:/lib/aarch64-linux-gnu/:/lib/../lib/:/usr/lib/aarch64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/aarch64-linux-gnu/5/../../../:/lib/:/usr/lib/\n', "COLLECT_GCC_OPTIONS='-E' '-v' '-shared-libgcc' '-mlittle-endian' '-mabi=lp64'\n"]
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 12.736936 seconds
Result is [ 1.23178032  1.61879341  1.52278065 ...,  2.20771815  2.29967753
  1.62323285]
Used the cpu

When I change the device flag to 'gpu':

THEANO_FLAGS=device=gpu python gpu_tutorial1.py

things improve somewhat, in that the NVIDIA Tegra X1 is at least found, although it is ultimately not used:

Using gpu device 0: NVIDIA Tegra X1 (CNMeM is disabled, cuDNN 5105)
WARNING (theano.gof.cmodule): OPTIMIZATION WARNING: Theano was not able to find the default g++ parameters. This is needed to tune the compilation to your specific CPU. This can slow down the execution of Theano functions. Please submit the following lines to Theano's mailing list so that we can fix this problem:
 ['# 1 "<stdin>"\n', '# 1 "<built-in>"\n', '# 1 "<command-line>"\n', '# 1 "/usr/include/stdc-predef.h" 1 3 4\n', '# 1 "<command-line>" 2\n', '# 1 "<stdin>"\n', 'Using built-in specs.\n', 'COLLECT_GCC=/usr/bin/g++\n', 'Target: aarch64-linux-gnu\n', "Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.2' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-arm64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-arm64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-arm64 --with-arch-directory=aarch64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu\n", 'Thread model: posix\n', 'gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.2) \n', "COLLECT_GCC_OPTIONS='-E' '-v' '-shared-libgcc' '-mlittle-endian' '-mabi=lp64'\n", ' /usr/lib/gcc/aarch64-linux-gnu/5/cc1 -E -quiet -v -imultiarch aarch64-linux-gnu - -mlittle-endian -mabi=lp64 -fstack-protector-strong -Wformat -Wformat-security\n', 'ignoring nonexistent directory "/usr/local/include/aarch64-linux-gnu"\n', 'ignoring nonexistent directory "/usr/lib/gcc/aarch64-linux-gnu/5/../../../../aarch64-linux-gnu/include"\n', '#include "..." search starts here:\n', '#include <...> search starts here:\n', ' /usr/lib/gcc/aarch64-linux-gnu/5/include\n', ' /usr/local/include\n', ' /usr/lib/gcc/aarch64-linux-gnu/5/include-fixed\n', ' /usr/include/aarch64-linux-gnu\n', ' /usr/include\n', 'End of search list.\n', 'COMPILER_PATH=/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/:/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/\n', 'LIBRARY_PATH=/usr/lib/gcc/aarch64-linux-gnu/5/:/usr/lib/gcc/aarch64-linux-gnu/5/../../../aarch64-linux-gnu/:/usr/lib/gcc/aarch64-linux-gnu/5/../../../../lib/:/lib/aarch64-linux-gnu/:/lib/../lib/:/usr/lib/aarch64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/aarch64-linux-gnu/5/../../../:/lib/:/usr/lib/\n', "COLLECT_GCC_OPTIONS='-E' '-v' '-shared-libgcc' '-mlittle-endian' '-mabi=lp64'\n"]
[Elemwise{exp,no_inplace}(<TensorType(float64, vector)>)]
Looping 1000 times took 12.820628 seconds
Result is [ 1.23178032  1.61879341  1.52278065 ...,  2.20771815  2.29967753
  1.62323285]
Used the cpu

I do plan to send the warning lines to the Theano mailing list, but that warning seems unrelated to what is currently my main issue: Why does this test code not execute on the TX1's GPU, and what do I need to do to make it do so?

talonmies
  • 70,661
  • 34
  • 192
  • 269
Novak
  • 4,687
  • 2
  • 26
  • 64

1 Answers1

2

It turns out that the recommended CLI invocation on that site is not correct. The correct invocation is:

THEANO_FLAGS='device=gpu,floatX=float32' python gpu_tutorial1.py

This is sufficient to execute on the GPU with a satisfactory speedup (noticeable and reported in the output) and to get rid of that gob-smackingly long error warning.

Putting both those flags in a .theanorc file is also sufficient, and simplifies the invocation.

Novak
  • 4,687
  • 2
  • 26
  • 64
  • For future users, `device=gpu` is a deprecated option, it is `cuda*` now. https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29 – tandem Nov 22 '17 at 13:23