0

Perhaps a border-line question, more related to debugging & sysadmin that stricto sensu to coding

I'm working (on Debian/Sid/x86-64) on preparing the next GCC MELT release, it is a complex meta-plugin for GCC providing (thru a free software GPLv3 plugin) a domain specific language for GCC translated to C++.

(If you want the current snapshot, download http://gcc-melt.org/melt-plugin-snapshot-r213101-2014july27.tar.bz2 it has 3.8Mbytes)

The building procedure of MELT is arcane and takes 15 minutes (on i7 3770K). It involves several runtime-generated C++ code and generated shell scripts and complex makefiles. So it is a nightmare.

I have a bug in some of my code. A dlopen is failing. The dlopen is done by some system GCC compiler (actually of course it is the cc1plus...). This happens in deeply nested scripts (e.g. a make invoking an autogen generation of a shell script, then that generated shell script is invoking another make, which invokes that GCC command; some shell or make variables are incorrect...).

Of course the system cc1plus is rightly giving fatal error on dlsym failure.

My bug (somewhere, don't now yet where) is that some shell or environment or make variable is wrongly transmitted, and instead of invoking the /usr/bin/g++-4.8 compiler, some intermediate script is invoking /usr/bin/g++ on my Debian where the default g++ is a g++-4.9

Of course I don't want to recompile my system /usr/bin/g++-4.8 or /usr/bin/g++-4.9

Question:

Is it possible (perhaps thru oprofile or LD_PRELOAD cute tricks?) to run a complex set of processes (some of them being g++ or g++-4.8) and either stop, or breakpoint, any cc1plus or cc1 (either in /usr/lib/gcc/x86_64-linux-gnu/4.8.3/ or in /usr/lib/gcc/x86_64-linux-gnu/4.9.1) whose dlopen is failing?

If the faulty process cc1plus doing the failing dlopen was stopped -without exiting- I could trace all the scripts and make chains starting it.

gory details

If you are brave enough to try reproduce my bug (I don't ask that, it is time consuming!) download my melt-plugin-snapshot-r213101-2014july27.tar.bz2 and on a Debian/Sid having both gcc-4.8 (4.8.3) and gcc-4.9 (4.9.1) installed and their plugin-dev (with the default /usr/bin/gcc being the 4.9.1 version), extract my plugin snapshot into /tmp and type make MELTGCC=gcc-4.8 GCCMELT_CXX=g++-4.8 in the extracted directory /tmp/melt-plugin-snapshot-r213101-2014july27/

At the end I'm getting incorrectly

 MELT BUILD SCRIPT INFO: meltfrom=melt-build-script.tpl:375/230-melt-build-script.tpl:451/377 meltmode=translateinit meltbase=warmelt-first meltstage=meltbuild-stage1 meltprevstage=meltbuild-stage0-quicklybuilt meltinit=meltbuild-stage0-quicklybuilt/warmelt-first.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-base.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-debug.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-macro.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-moremacro.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-normal.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-normatch.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-genobj.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-outobj.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-hooks.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-modes.quicklybuilt meltinclude= meltsrc=/tmp/melt-plugin-snapshot-r213101-2014july27/melt/warmelt-first.melt
 MELT BUILD SCRIPT INFO: melt-build-script.tpl:375/230-melt-build-script.tpl:451/377 emit C++ code for warmelt-first of meltbuild-stage1
 MELT BUILD SCRIPT INFO: melt-build-script.tpl:375/230-melt-build-script.tpl:451/377 argument file meltbuild-stage1/warmelt-first.args is
 -Wno-shadow -frandom-seed=ea257ebecb10e5a2143d906632a799b0
  -DGCCMELT_FROM_ARG="melt-build-script.tpl:375/230-melt-build-script.tpl:451/377"
  -fplugin-arg-melt-mode=translateinit
  -fplugin-arg-melt-arg=/tmp/melt-plugin-snapshot-r213101-2014july27/melt/warmelt-first.melt
  -fplugin-arg-melt-output=meltbuild-stage1/warmelt-first
  -fplugin-arg-melt-module-make-command='make'
  -fplugin-arg-melt-module-makefile=/tmp/melt-plugin-snapshot-r213101-2014july27/melt-module.mk
  -fplugin-arg-melt-module-cflags='-I /tmp/melt-plugin-snapshot-r213101-2014july27 -I /tmp/melt-plugin-snapshot-r213101-2014july27/melt/generated  -I. -Imeltbuild-stage1 -Imeltbuild-stage0-quicklybuilt -I /usr/lib/gcc/x86_64-linux-gnu/4.8/plugin/melt-headers/1.1-rc1-snap-svnrev-213094 -I /usr/lib/gcc/x86_64-linux-gnu/4.8/plugin/include -I /usr/lib/gcc/x86_64-linux-gnu/4.8/plugin/include/c-family -I /tmp/melt-plugin-snapshot-r213101-2014july27/melt/generated -I /tmp/melt-plugin-snapshot-r213101-2014july27'
  -fplugin-arg-melt-init=meltbuild-stage0-quicklybuilt/warmelt-first.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-base.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-debug.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-macro.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-moremacro.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-normal.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-normatch.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-genobj.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-outobj.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-hooks.quicklybuilt:meltbuild-stage0-quicklybuilt/warmelt-modes.quicklybuilt
  -fplugin-arg-melt-workdir=meltbuild-workdir
  -fplugin-arg-melt-tempdir=meltbuild-tempdir
  -fplugin-arg-melt-source-path=meltbuild-stage1:meltbuild-stage0-quicklybuilt:.
  -fplugin-arg-melt-module-path=meltbuild-stage1:meltbuild-stage0-quicklybuilt:.
  -fplugin-arg-melt-bootstrapping
  -fplugin-arg-melt-generate-work-link
  -fplugin-arg-melt-generated-c-file-list=meltbuild-stage1/warmelt-first.cfilist
 meltbuild-empty-file.c
 cc1plus: error: cannot load plugin ./melt.so
 ./melt.so: undefined symbol: _Z28gt_ggc_mx_gimple_statement_dPv
 MELT BUILD SCRIPT FAILURE: melt-build-script.tpl:375/230-melt-build-script.tpl:451/377 failed with arguments @meltbuild-stage1/warmelt-first.args
 Makefile:429: recipe for target 'melt-translator' failed
 make: *** [melt-translator] Error 1
 rm gfmeltgcc_revision _melt-runtime.c gfmeltgcc_run_md5 gfmeltgcc_version_number _meltrunsup-inc.c
 make MELTGCC=gcc-4.8 GCCMELT_CXX=g++-4.8  77.48s user 3.79s system 98% cpu 1:22.34 total

Of course a g++-4.9 is incorrectly dlopen-ing a melt.so for GCC 4.8. The bug is mine, something is starting g++ instead of g++-4.8 (which should have been transmitted by make and shell variables).


workaround ....

I did not found an answer to my question, but I made a workaround: I patched GCC 4.9.1 source code (file gcc/plugin.c function try_init_one_plugin) to add some statements after failing dlopen e.g.

  dl_handle = dlopen (plugin->full_name, RTLD_NOW | RTLD_GLOBAL);
  if (!dl_handle)
    {
      char*errmsg = xstrdup(dlerror ());
      fprintf(stderr,
              "\n @@ GCC PLUGIN LOAD FAILURE pid %d %s: %s\n",
              (int) getpid(), plugin->full_name, errmsg);
      fflush(NULL);
      system("pstree -a -h -l -s");
      system("ps auxw");
      fflush(NULL);
      fprintf(stderr, 
              "\n @@ GCC PLUGIN LOAD FAILURE sleeping 8 seconds\n");
      fflush(NULL);
      sleep (8);
      fprintf(stderr, 
              "\n @@ GCC PLUGIN LOAD FAILURE selfstopping\n");
      fflush(NULL);
      sleep (1);
      kill (SIGSTOP, getpid());
      fflush(NULL);

      error ("cannot load plugin %s\n%s", plugin->full_name, errmsg);
      free (errmsg);
      return false;
    }

The fprintf, system and fflush (with the strdup and free of result of dlerror ...) calls above form my temporary patch. Then I compiled my patched GCC 4.9.1 and using PATH tricks ensured that it was called.

With that, I was able to correct my initial bug.

Patching dlopen would be difficult and quite dangerous (too often used in too many places).

I also thought of process accounting acct(2) & acct(5) ...). Finally patching GCC was the most easy (but a bit shameful) thing to do.

BTW, such bugs is why I believe more and more in free software. If my compiler accepting plugins was closed-source, I would have much more hard time finding the bug.

N.B. I will remove the plugin snapshot in a few days.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547

0 Answers0