-1

Two different so dlopen & dlclose couple of times, blocked on dlopen.

Hangs on dlopen, which outputs nothing, cpuidle down to 0%, and couldn't quit via ctrl+c.

LOG_TRACE("attaching...");
handle = dlopen(plugin_path.c_str(), RTLD_LAZY);
LOG_DEBUG("dlopen called");     // this line did not output, after success couple of times;

then I use gdb attach to the procedure:

(gdb) bt
#0  0x0000002a960dbe60 in tcmalloc::ThreadCache::InitTSD () at src/thread_cache.cc:321
#1  0x0000002a960d51bf in TCMallocGuard (this=Variable "this" is not available.) at src/tcmalloc.cc:908
#2  0x0000002a960d5e00 in global constructors keyed to _ZN61FLAG__namespace_do_not_use_directly_use_DECLARE_int64_instead43FLAGS_tcmalloc_large_alloc_report_thresholdE () at src/tcmalloc.cc:935
#3  0x0000002a960fafc6 in __do_global_ctors_aux () at ./src/base/spinlock.h:54
#4  0x0000002a96010f13 in _init () from ../plugins/libmonitor.so
#5  0x0000002a00000000 in ?? ()
#6  0x000000302ad0acaf in _dl_init_internal () from /lib64/ld-linux-x86-64.so.2
#7  0x000000302aff725c in dl_open_worker () from /lib64/tls/libc.so.6
#8  0x000000302ad0aa60 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#9  0x000000302aff79fa in _dl_open () from /lib64/tls/libc.so.6
#10 0x000000302b201054 in dlopen_doit () from /lib64/libdl.so.2
#11 0x000000302ad0aa60 in _dl_catch_error () from /lib64/ld-linux-x86-64.so.2
#12 0x000000302b201552 in _dlerror_run () from /lib64/libdl.so.2
#13 0x000000302b201092 in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#14 0x000000000041b559 in uap::meta::MetaManageServiceHandler::plugin_action this=0xb26000, _return=@0x7fbffff500, plugin_name=@0x7fbffff4e0, plugin_path=@0x7fbffff570, t=Variable "t" is not available.)
at /usr/lib/gcc/x86_64-redhat-linux/3.4.5/../../../../include/c++/3.4.5/bits/basic_string.h:1456
#15 0x000000000041b0bc in uap::meta::MetaManageServiceHandler::plugin_action (this=0xb26000, _return=@0x7fbffff500, plugin_name=@0x7fbffff4e0, plugin_path=@0x7fbffff570, t=uap::meta::PluginActionType::RELOAD)
at server/service_handler.cpp:173
#16 0x0000000000417641 in uap::meta::test_Service_Handler_suite_test_case_manage_service_plugin_action_Test::TestBody (this=0xb16080) at test_load.cpp:73
#17 0x00000000004446c6 in testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void> (object=0xb16080, method={__pfn = 0x21, __delta = 0}, location=0x537f30 "the test body")
at ../../../../com/btest/gtest/src/gtest.cc:2744
#18 0x000000000042dd1c in testing::Test::Run (this=0xb16080) at ../../../../com/btest/gtest/src/gtest.cc:2766
#19 0x000000000042e8b4 in testing::TestInfo::Run (this=0xb17160) at ../../../../com/btest/gtest/src/gtest.cc:2958
#20 0x000000000042f415 in testing::TestCase::Run (this=0xb23000, runtype=0) at ../../../../com/btest/gtest/src/gtest.cc:3160
#21 0x0000000000436352 in testing::internal::UnitTestImpl::RunAllTests (this=0xb22000) at ../../../../com/btest/gtest/src/gtest.cc:5938
#22 0x0000000000434299 in testing::UnitTest::Run (this=0x6f4220, run_type=0) at ../../../../com/btest/gtest/src/gtest.cc:5449
#23 0x0000000000434268 in testing::UnitTest::Run (this=0x6f4220) at ../../../../com/btest/gtest/src/gtest.cc:5387
#24 0x0000000000455404 in main (argc=1, argv=0x7fbffff8c8) at ../../../../com/btest/gtest/src/gtest_main.cc:38

actually i have redefined the four functions:

void __attribute__((constructor)) dlinit()                                                                                                                                                                   
{
}

void __attribute__((destructor)) dldeinit()
{
}

void _init()
{
}

void _fini()
{
}
user2530422
  • 109
  • 8
  • What program exhibit this behavior, with what plugin? How was both the program and the plugin built? What is your `LD_LIBRARY_PATH` variable? Does your `plugin_path` contain a `/` ? What is its value? And you should add a `fflush(NULL);` after your code above! – Basile Starynkevitch Jul 04 '13 at 04:50
  • thanks for the reply, I just did not set LD_LIBRARY_PATH, and plugin_path with absolute path start with /, does fflush necessary , since i just add printf here, but my code does not, it's not fflush issue, since hang at dlopen – user2530422 Jul 04 '13 at 05:08
  • I don't understand how and why you believe that the code is hanging at `dlopen` time? Did you use the debugger ? Show the code of the plugin, and more code of your program. – Basile Starynkevitch Jul 04 '13 at 05:14
  • LOG_TRACE("attaching..."); handle = dlopen(plugin_path.c_str(), RTLD_LAZY); LOG_DEBUG("dlopen called"); // this line did not output, after success couple of times; – user2530422 Jul 04 '13 at 07:30
  • Use `valgrind` to check that your heap is not badly corrupted. – Basile Starynkevitch Jul 04 '13 at 07:37
  • looks hang on _init(), but i defined it null – user2530422 Jul 04 '13 at 08:58
  • Your backtrace mention some `malloc`; I suspect a heap corruption elsewhere. Please use `valgrind` ! – Basile Starynkevitch Jul 04 '13 at 09:10
  • any other way instead of this tool? – user2530422 Jul 04 '13 at 09:23
  • Why do you dislike `valgrind` ? You could also buy [purify](https://www.ibm.com/developerworks/rational/products/purifyplus/?S_TACT=105AGY59&S_CMP=09&ca=dtl-0903)! – Basile Starynkevitch Jul 04 '13 at 09:27
  • i use "valgrind --leak-check=full xxx" nothing wrong output, and my program still hang there – user2530422 Jul 04 '13 at 11:14
  • Suggestion: first debug your heap issues (using `valgrind` or something else) without using any plugin. – Basile Starynkevitch Jul 04 '13 at 13:01
  • failed to find useful info via valgrind:--11221-- Discarding syms at 0x8ddc6e0-0x8eb5588 in /xxx/plugins/libsample.so due to munmap() --11221-- Reading syms from /xxx/plugins/libsample.so which correspond to dlclose and dlopen , then hang with previous gdb info – user2530422 Jul 05 '13 at 04:03
  • I think I have found the root cause: in gdb info , the hang comes form tcmalloc, i have read the tcmalloc corelated code , and couple of locks, then i complie and link so without tcmalloc, nothing happens, this would be tcmalloc bugs when work with so – user2530422 Jul 05 '13 at 07:35

2 Answers2

1

I think I have found the root cause: in gdb info , the hang comes form tcmalloc, i have read the tcmalloc corelated code , and couple of locks, then i complie and link so without tcmalloc, nothing happens, this would be tcmalloc bugs when work with so

user2530422
  • 109
  • 8
  • Do you know what the hang was? I am loading a library with dlopen which, for some reason, is opening a second library. Inside this load tcmalloc is live locking. I'll take a look through the source too, but if you already found the problem it'd be great to know. – ChrisAshton84 Oct 07 '13 at 18:24
0

You should compile both your application and your plugin with gcc -Wall -g and use the debugger gdb (don't forget to compile the plugin sources also with -fPIC and to link its object files with -shared).

As you probably know, dlopen-ing a shared object will run the function having a constructor function attribute (and also, as dlopen(3) says, the obsolete _init function). Also, constructors of C++ static data have the constructor attribute.

I guess that some of these constructors is blocked somehow (perhaps on input). You could also strace your program.

There might be some other reasons for such blocking, e.g. dlopen-ing an NFS mounted file from an unresponsive NFS server, etc...

See also rtld-audit(7), ld.so(8) and LD_DEBUG environment variable (try to set it to all). Also, run ldd on both the plugin and the program.

BTW, in your code the lack of terminating newline \n in your printf format strings is suspicious (and bad taste), and you should print dlerror() when dlopen fails. At least add a call to fflush(NULL); after your code. Try to code instead:

handle = dlopen(plugin_path.c_str(), RTLD_LAZY);
if(!handle) { 
    printf("dlopening %s failed %s\n", plugin_path.c_str(), dlerror());
} else { 
    printf("dlopen %s success\n", plugin_path.c_str());
}
fflush(NULL);

You may also have corrupted your heap (elsewhere in your program) to the point that dlopen (or your plugin) cannot work anymore. Use valgrind to hunt memory corruption bugs!

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • yes, those compile and link para already added. and there is not _init _fini or andy constructor function. actually , plugins dlopen and dlclose correct sometimes, and hang sometimes ; – user2530422 Jul 04 '13 at 05:13