0

I am have a Red Hat server x86_64 and I am trying to compile a CUDA GPU Data Management System (GDBMS).

CUDA (7.5) is already set up and also the CUDAPATH and LD_LIBRARY_PATH are set to the corresponding installation path correctly (checked).

I have the following Makefile:

-bash-4.1$ cat Makefile 
# Micro test cases
TESTS = test_init_fini test_malloc test_cow test_memcpy test_memset \
        test_launch test_ptarray test_evict_local

# Paths
CUDAPATH = /usr/local/cuda
MQXPATH := `pwd`/../../src
TMPPATH = ./tmp

# Compiler/linker settings
NVCC := $(CUDAPATH)/bin/nvcc
CFLAGS := -c --compiler-options -Wall -arch=sm_20 -I$(CUDAPATH)/include -I$(MQXPATH) -Xcompiler '-fPIC' -dc
LDFLAGS := -L$(CUDAPATH)/lib64 -L$(MQXPATH) -Xlinker -rpath=$(MQXPATH) -lmqx -Xcompiler '-fPIC' -dlink

.DEFAULT_GOAL := all
.SECONDEXPANSION:
.PHONY : all test setup cleanup $(TESTS)

TESTBINS := $(addprefix $(TMPPATH)/,$(TESTS))

all : $(TMPPATH) $(TESTBINS)

$(TMPPATH) :
    @mkdir -p $(TMPPATH)

$(TESTBINS) : $$@.o
    @./tcgen.py $<
    @$(NVCC) $(CFLAGS) main.cu -o $(TMPPATH)/main.o
    $(NVCC) $(LDFLAGS) $(TMPPATH)/main.o $< -o $@
    -@rm $(TMPPATH)/main.o

$(TMPPATH)/%.o : %.cu
    $(NVCC) $(CFLAGS) $< -o $@

# No rules for source files
%.c : ;

$(TESTS) : $(TMPPATH)/$$@
    @echo "================================================================"
    @LD_PRELOAD=$(MQXPATH)/libmqx.so $(TMPPATH)/$@
    @echo ""

test : setup $(TESTS) cleanup

setup:
    @$(MQXPATH)/mqxctl --start -v
    @echo ""

cleanup:
    @$(MQXPATH)/mqxctl --stop -v

clean:
    -@rm $(TESTBINS) $(TMPPATH)/*.o testcases.h
    -@rm -r $(TMPPATH)

As you can see, after executing the above Makefile, there are some produced files in tmp/ dir. These files are some tests that need to be passed in order for the application to be tested. The compilation succeeds but then a very strange thing happens that I cannot explain. All the produced files in the tmp/ dir are NOT executable and as a result the test phase cannot be completed.

More concretely, after running make, and having the succeeded the compilation, I need to run make test in order to run the tests. However, I get the error:

/bin/sh: ./tmp/test_init_fini: cannot execute binary file
make: *** [test_init_fini] Error 126

test_init_fini is one of the produced files and as you can see bellow it is not executable.

-bash-4.1$ file *
test_cow:           ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_cow.o:         ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_evict_local:   ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_evict_local.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_init_fini:     ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_init_fini.o:   ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_launch:        ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_launch.o:      ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_malloc:        ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_malloc.o:      ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_memcpy:        ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_memcpy.o:      ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_memset:        ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_memset.o:      ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_ptarray:       ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
test_ptarray.o:     ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

So after all, what could be wrong with my Makefile and cannot produce executable files? I have read that -c flag in GCC might cause the problem but when I remove it, I get the same result.

*Notice that mqxctl file in the Makefile has been produced from a previous phase succeeded compilation and it works fine (checked).

I would be glad to provide more information if needed. Are there any ideas?? Thank you very much for your patience.

[EDIT]

More Info: So, the initial GDBMS that I am trying to experiment with is MultiQx. If you have a quick look at the README file you will see that they have created it and tested it with CUDA 5.0 so they do not provide any guarantee that it can work for newer versions.

"We recommend installing CUDA SDK 5.0, which is known to work with libmqx. Newer versions of CUDA SDKs may have some linking problems with libmqx"

and as it was previously mentioned, my server has CUDA 7.5 installed (and I am not allowed to change it).

When I initially tried it in its default form, I was getting the following error by running make in tests/micro/ dir. (make in src/ folder was initially succeeded)

    /usr/local/cuda/bin/nvcc -L/usr/local/cuda/lib64 -L`pwd`/../../src -Xlinker -rpath=`pwd`/../../src -lmqx ./tmp/main.o tmp/test_init_fini.o -o tmp/test_init_fini
/usr/bin/ld: tmp/test_init_fini: hidden symbol `cudaFreeHost' in /usr/local/cuda/lib64/libcudart_static.a(libcudart_static.a.o) is referenced by DSO
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status
make: *** [tmp/test_init_fini] Error 1

As a result, I started looking online for the error's possible solutions in order to fix the problem and by reading some other posts I have modified my Makefile (you can download the initial Makefile in the aforementioned provided link, I have just added some cflags & ldflags) so I achieved to pass the compilation phase.

Then, I got stacked in the testing phase.

[2nd EDIT] According to @RobertCrovela answer in the comments I definitely have to get rid of -dlink flag which is totally acceptable. So now, having to get rid of the -dlink flag, I focus on solving the following error:

    /usr/local/cuda/bin/nvcc -L/usr/local/cuda/lib64 -L`pwd`/../../src -Xlinker -rpath=`pwd`/../../src -lmqx ./tmp/main.o tmp/test_init_fini.o -o tmp/test_init_fini
/usr/bin/ld: tmp/test_init_fini: hidden symbol `cudaFreeHost' in /usr/local/cuda/lib64/libcudart_static.a(libcudart_static.a.o) is referenced by DSO
/usr/bin/ld: final link failed: Nonrepresentable section on output
collect2: ld returned 1 exit status
make: *** [tmp/test_init_fini] Error 1

Why cudaFreeHost cannot be seen?

dinosaur
  • 59
  • 1
  • 8
  • `-Xcompiler '-fPIC'` looks suspicious. Could you remove that from `LDFLAGS` and see if it helps? – kaylum Oct 06 '16 at 23:59
  • Such error messages are often produced when the interpreter specified in the ELF file is something weird. An easy way to check that out is to look at the beginning of the file with, say, just less and seeing a string that starts with `/lib` in the first kilobyte or so of the file - is that something sensible looking? – Petr Baudis Oct 07 '16 at 00:11
  • thanks for replying, I removed `-Xcompiler 'fPIC'` but I still get exactly the same output.. and I think that there is not '/lib' in the whole file. :/ – dinosaur Oct 07 '16 at 00:27
  • 1
    How did you come by this makefile? If you wrote it yourself, what was the last change before it began misbehaving? Can you simplify it, and give us a [minimal complete example](http://stackoverflow.com/help/mcve)? – Beta Oct 07 '16 at 00:48
  • 1
    get rid of `-dlink` – Robert Crovella Oct 07 '16 at 00:59
  • @RobertCrovella Unfortunately, getting rid of -dlink raises `/usr/local/cuda/bin/nvcc -L/usr/local/cuda/lib64 -L`pwd`/../../src -Xlinker -rpath=`pwd`/../../src -lmqx ./tmp/main.o tmp/test_init_fini.o -o tmp/test_init_fini /usr/bin/ld: tmp/test_init_fini: hidden symbol 'cudaFreeHost' in /usr/local/cuda/lib64/libcudart_static.a(libcudart_static.a.o) is referenced by DSO /usr/bin/ld: final link failed: Nonrepresentable section on output` which is the initial error that made me modify the initial Makefile. @Beta I put some more info in the initial post – dinosaur Oct 07 '16 at 08:34
  • 1
    As you're discovering, adding `-dlink` did not solve your problem. The reason it appeared to fix your final-link problem is because when you add `-dlink`, that command *no longer performs* final link. So obviously it makes the final link problem "go away". But it's not a solution. You can discover this by reading the nvcc manual. – Robert Crovella Oct 07 '16 at 14:16
  • @RobertCrovella sure, thank you very much for the info. In the meanwhile, if anyone has an idea or has experienced kind of the same problem feel free to comment :) – dinosaur Oct 07 '16 at 14:29
  • @RobertCrovella I have searched a lot in the documentation. I know what my error is about, but I cannot explain it (Until know I did not deal with that because i mistakenly thought that I had solved it like you said). So, the error says that cudaFreeHost function cannot be seen from the compiler which means that we have a linker error. The weird fact however, is that if you check the original Makefile file CUDAPATH and LD_LIBRARY_PATH are both linked and as a result I was expecting not to have such an error since cudaFreeHost is a standard included function. What else could it be? :/ – dinosaur Oct 07 '16 at 15:22
  • 1
    The error says that the compiler *can* find the function, but the function is set to not be linked dynamically. Maybe try to link it statically? Check this: http://stackoverflow.com/questions/23696585/what-does-exactly-the-warning-mean-about-hidden-symbol-being-referenced-by-dso – CygnusX1 Oct 10 '16 at 03:43

0 Answers0