How do I create a debug build of a recent Tensorflow version with CUDA Support?

Question

I tried and tried to create a debug build for a recent version of Tensorflow , using the official docker images (latest-cuda-devel-py3 -> r1.12.0) but nothing seems to work. Has someone recently created a successful debug build for Tensorflow (>= r1.11.0) and can share his approach ?

This is what I tried so far.

I basically tried to follow the instructions at https://www.tensorflow.org/install/source, but tried to modify them to generate a debug build. Nothing I tried resulted in a successful build.

The Host System is a Linux x86-64 machine with lots of RAM (e.g. 512 GB of RAM -> DGX-1). The CUDA Version within the Docker-Image is CUDA-9.0. The recent "latest" Tensorflow Version which is inside the docker image is r1.12.0

In order to get any cuda-build working, I needed to use "nvidia-docker", otherwise I get a linker error with "libcuda.so.1".

I started like this:

nvidia-docker pull tensorflow/tensorflow:latest-devel-gpu-py3
nvidia-docker run --runtime=nvidia -it -w /tensorflow -v $PWD:/mnt -e HOST_PERMS="$(id -u):$(id -g)" \
    tensorflow/tensorflow:latest-devel-gpu-py3 bash

Then I tried to configure the project using

cd /tensorflow
./configure

I tried various configs. I tried keeping all values at their defaults. I tried enabling only the parts which I need. I tried not running ./configure at all. I pointed it to my own cuda-9.0 and tensorrt installtion. But not running ./configure at all (in the docker image) seems to produce best results (e.g. I can do optimized builds successfully with least effort).

If I build it using the exact official build instructions, i.e. creating an optimized/non-debug build, everything works as expected. So running the following seems to succeed.

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

Same thing, if I run the following, which includes debug info, but does not turn off optimization (e.g. I cannot really use this for debug purposes).

bazel build --config cuda --strip=never -c opt --copt="-ggdb"  //tensorflow/tools/pip_package:build_pip_package

But everything which disables optimizations does not seem to work. If I run the following (with or without the --strip=never flag )

bazel build --config cuda --strip=never -c dbg
//tensorflow/tools/pip_package:build_pip_package

I arrive at the following error:

INFO: From Compiling tensorflow/contrib/framework/kernels/zero_initializer_op_gpu.cu.cc: external/com_google_absl/absl/strings/string_view.h(496): error: constexpr function return is non-constant

Which can be resolved by defining -DNDEBUG (see nvcc error: string_view.h: constexpr function return is non-constant ).

But If I run the following:

bazel build --config cuda --strip=never -c dbg --copt="-DNDEBUG"  //tensorflow/tools/pip_package:build_pip_package

I get these linking errors at the final step of the build:

ERROR: /tensorflow/python/BUILD:3865:1: Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so' failed (Exit 1) /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In function _init': (.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbolgmon_start' /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function deregister_tm_clones': crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against.tm_clone_table' crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol __TMC_END__' defined in .nvFatBinSegment section in bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so crtstuff.c:(.text+0x1e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol_ITM_deregisterTMCloneTable' /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function register_tm_clones': crtstuff.c:(.text+0x43): relocation truncated to fit: R_X86_64_PC32 against.tm_clone_table' crtstuff.c:(.text+0x4a): relocation truncated to fit: R_X86_64_PC32 against symbol __TMC_END__' defined in .nvFatBinSegment section in bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so crtstuff.c:(.text+0x6b): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol_ITM_registerTMCloneTable' /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function __do_global_dtors_aux': crtstuff.c:(.text+0x92): relocation truncated to fit: R_X86_64_PC32 against.bss' crtstuff.c:(.text+0x9c): relocation truncated to fit: R_X86_64_GOTPCREL against symbol __cxa_finalize@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6 crtstuff.c:(.text+0xaa): relocation truncated to fit: R_X86_64_PC32 against symbol__dso_handle' defined in .data.rel.local section in /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o crtstuff.c:(.text+0xbb): additional relocation overflows omitted from the output bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so: PC-relative offset overflow in GOT PLT entry for `_ZNK5Eigen10TensorBaseINS_9TensorMapINS_6TensorIKjLi1ELi1EiEELi16ENS_11MakePointerEEELi0EE9unaryExprINS_8internal11scalar_leftIjjN10tensorflow7functor14right_shift_opIjEEEEEEKNS_18TensorCwiseUnaryOpIT_KS6_EERKSH_' collect2: error: ld returned 1 exit status Target //tensorflow/tools/pip_package:build_pip_package failed to build

I hoped to be able to solve that by doing a monolithic build. So I tried that, and got essentially the same error.

bazel build --config cuda -c dbg --config=monolithic --copt="-DNDEBUG"  //tensorflow/tools/pip_package:build_pip_package

I also tried the approaches from TensorFlow doesnt build with debug mode and several other variants I found by extensive googling. I'm running out of options.

I'd take any Tensorflow version from 1.11 onwards, including (working) nightly builds. It just needs to work with CUDA 9 on x86 linux, include debug symbols and disabled optimizations.

thank you very much in Advance..

Kai Londenberg · Answer 1 · 2018-11-11T10:16:35.140

Just in case someone else stumbles over this problem. I finally got it to compile, using the following command:

bazel build --config cuda --strip=never --copt="-DNDEBUG" --copt="-march=native" --copt="-Og" --copt="-g3" --copt="-mcmodel=medium" --copt="-fPIC"  //tensorflow/tools/pip_package:build_pip_package

After that, installation is a bit of a hazzle, since the wheel cannot be built anymore. But the tensorflow build can be installed anyway:

When building the wheel, via

./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg

The process fails with an error which seems to be a problem with python's builtin zip compression library (i.e. it cannot compress the resulting archive, since it's too large).

It's important to run it anyway, since it only fails at the final step (archiving). When running build_pip_package, it prints to the console right at the start of the process, that it's building the package in a temporary directory (say, /tmp/Shjwejweu ) - the contents of that temp directory can be used to install tf debug version. Simply copy it to the target machine, then make sure you have any old tensorflow package removed (e.g. pip uninstall tensorflow), and run within:

python setup.py install

But be careful to actively uninstall the "tensorflow" package first, otherwise you can get two simultaneously installed tensorflow versions..

How do I create a debug build of a recent Tensorflow version with CUDA Support?

1 Answers1