I tried and tried to create a debug build for a recent version of Tensorflow , using the official docker images (latest-cuda-devel-py3 -> r1.12.0) but nothing seems to work. Has someone recently created a successful debug build for Tensorflow (>= r1.11.0) and can share his approach ?
This is what I tried so far.
I basically tried to follow the instructions at https://www.tensorflow.org/install/source, but tried to modify them to generate a debug build. Nothing I tried resulted in a successful build.
The Host System is a Linux x86-64 machine with lots of RAM (e.g. 512 GB of RAM -> DGX-1). The CUDA Version within the Docker-Image is CUDA-9.0. The recent "latest" Tensorflow Version which is inside the docker image is r1.12.0
In order to get any cuda-build working, I needed to use "nvidia-docker", otherwise I get a linker error with "libcuda.so.1".
I started like this:
nvidia-docker pull tensorflow/tensorflow:latest-devel-gpu-py3
nvidia-docker run --runtime=nvidia -it -w /tensorflow -v $PWD:/mnt -e HOST_PERMS="$(id -u):$(id -g)" \
tensorflow/tensorflow:latest-devel-gpu-py3 bash
Then I tried to configure the project using
cd /tensorflow
./configure
I tried various configs. I tried keeping all values at their defaults. I tried enabling only the parts which I need. I tried not running ./configure at all. I pointed it to my own cuda-9.0 and tensorrt installtion. But not running ./configure at all (in the docker image) seems to produce best results (e.g. I can do optimized builds successfully with least effort).
If I build it using the exact official build instructions, i.e. creating an optimized/non-debug build, everything works as expected. So running the following seems to succeed.
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
Same thing, if I run the following, which includes debug info, but does not turn off optimization (e.g. I cannot really use this for debug purposes).
bazel build --config cuda --strip=never -c opt --copt="-ggdb" //tensorflow/tools/pip_package:build_pip_package
But everything which disables optimizations does not seem to work. If I run the following (with or without the --strip=never flag )
bazel build --config cuda --strip=never -c dbg
//tensorflow/tools/pip_package:build_pip_package
I arrive at the following error:
INFO: From Compiling tensorflow/contrib/framework/kernels/zero_initializer_op_gpu.cu.cc: external/com_google_absl/absl/strings/string_view.h(496): error: constexpr function return is non-constant
Which can be resolved by defining -DNDEBUG (see nvcc error: string_view.h: constexpr function return is non-constant ).
But If I run the following:
bazel build --config cuda --strip=never -c dbg --copt="-DNDEBUG" //tensorflow/tools/pip_package:build_pip_package
I get these linking errors at the final step of the build:
ERROR: /tensorflow/python/BUILD:3865:1: Linking of rule '//tensorflow/python:_pywrap_tensorflow_internal.so' failed (Exit 1) /usr/lib/gcc/x86_64-linux-gnu/5/../../../x86_64-linux-gnu/crti.o: In function
_init': (.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol
gmon_start' /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In functionderegister_tm_clones': crtstuff.c:(.text+0x3): relocation truncated to fit: R_X86_64_PC32 against
.tm_clone_table' crtstuff.c:(.text+0xa): relocation truncated to fit: R_X86_64_PC32 against symbol__TMC_END__' defined in .nvFatBinSegment section in bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so crtstuff.c:(.text+0x1e): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol
_ITM_deregisterTMCloneTable' /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In functionregister_tm_clones': crtstuff.c:(.text+0x43): relocation truncated to fit: R_X86_64_PC32 against
.tm_clone_table' crtstuff.c:(.text+0x4a): relocation truncated to fit: R_X86_64_PC32 against symbol__TMC_END__' defined in .nvFatBinSegment section in bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so crtstuff.c:(.text+0x6b): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol
_ITM_registerTMCloneTable' /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o: In function__do_global_dtors_aux': crtstuff.c:(.text+0x92): relocation truncated to fit: R_X86_64_PC32 against
.bss' crtstuff.c:(.text+0x9c): relocation truncated to fit: R_X86_64_GOTPCREL against symbol__cxa_finalize@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6 crtstuff.c:(.text+0xaa): relocation truncated to fit: R_X86_64_PC32 against symbol
__dso_handle' defined in .data.rel.local section in /usr/lib/gcc/x86_64-linux-gnu/5/crtbeginS.o crtstuff.c:(.text+0xbb): additional relocation overflows omitted from the output bazel-out/k8-dbg/bin/tensorflow/python/_pywrap_tensorflow_internal.so: PC-relative offset overflow in GOT PLT entry for `_ZNK5Eigen10TensorBaseINS_9TensorMapINS_6TensorIKjLi1ELi1EiEELi16ENS_11MakePointerEEELi0EE9unaryExprINS_8internal11scalar_leftIjjN10tensorflow7functor14right_shift_opIjEEEEEEKNS_18TensorCwiseUnaryOpIT_KS6_EERKSH_' collect2: error: ld returned 1 exit status Target //tensorflow/tools/pip_package:build_pip_package failed to build
I hoped to be able to solve that by doing a monolithic build. So I tried that, and got essentially the same error.
bazel build --config cuda -c dbg --config=monolithic --copt="-DNDEBUG" //tensorflow/tools/pip_package:build_pip_package
I also tried the approaches from TensorFlow doesnt build with debug mode and several other variants I found by extensive googling. I'm running out of options.
I'd take any Tensorflow version from 1.11 onwards, including (working) nightly builds. It just needs to work with CUDA 9 on x86 linux, include debug symbols and disabled optimizations.
thank you very much in Advance..