0

I'm experiencing weird behavior when data is passed to a certain function in some generated code. The problem occurs whenever optimizations are enabled (-O1, and higher). But not on -O0.

The C code is generated by OpenModelica 1.13.0-dev, and compiled on Centos 6.9 32 bits using gcc 4.4.7. I know my setup is a bit old, but I can't do otherwise.

I was able to step into the code with gdb to get a backtrace of the faulty function with -O0

__OMC_DIV_SIM (threadData=0x83bb7e0, a=0.90000000000000002, b=1, msg=0xac4f6024 "PMECH1 - D * SLIP / 1.0 + SLIP", equationIndexes=0xbfffd4a0,
    noThrowDivZero=1 '\001', time_=0, initial_=1 '\001')

And here's a backtrace of the same function with -O2

__OMC_DIV_SIM (threadData=0x83bb7a8, a=-9.2559642734470712e+61, b=5.298772688916812e-315, msg=0x1 <Address 0x1 out of bounds>,
    equationIndexes=0x3ff00000, noThrowDivZero=-60 '\304', time_=3.7134892271125328e-314, initial_=0 '\000')

All researches for Address 0x1 out of bounds are pointing to stack or memory corruption. I then ran my executable through valgrind.

==5351== Invalid read of size 1
==5351==    [...]
==5351==    by 0x7EA13F4: __OMC_DIV_SIM (division.h:66)
==5351==    [...]
==5351==  Address 0x1 is not stack'd, malloc'd or (recently) free'd
==5351==
==5351==
==5351== Process terminating with default action of signal 11 (SIGSEGV)
==5351==  Access not within mapped region at address 0x1
==5351==    [...]
==5351==    by 0x7EA13F4: __OMC_DIV_SIM (division.h:66)
==5351==    [...]
==5351==  If you believe this happened as a result of a stack
==5351==  overflow in your program's main thread (unlikely but
==5351==  possible), you can try to increase the size of the
==5351==  main thread stack using the --main-stacksize= flag.
==5351==  The main thread stack size used in this run was 8388608.

The code is not so useful, since it's not for the user to read. I modified it a bit to make it more readable and to pinpoint the exact problem.

    double value = 0.0;
    if ((long)data->localData[0]->integerVars[0] /* TRIPI */ == ((long) 0))
    {
            double PMECH1 = data->localData[0]->realVars[24];
            double D = data->simulationInfo->realParameter[21];
            double SLIP = data->localData[0]->realVars[4];
            double TELEC = data->localData[0]->realVars[35];
            double H = data->simulationInfo->realParameter[24];

            double div_1 = 0.0;
            {
                    double a = PMECH1 - ((D) * (SLIP));
                    double b = 1.0 + SLIP;

                    div_1 = __OMC_DIV_SIM(threadData, a, b, "PMECH1 - D * SLIP / 1.0 + SLIP", equationIndexes, data->simulationInfo->noThrowDivZero, data->localData[0]->timeValue, initial());
            }
            // ...
    }

Then here's the file that contains __OMC_DIV_SIM

#define DIVISION_SIM(a,b,msg,equation) (__OMC_DIV_SIM(threadData, a, b, msg, equationIndexes, data->simulationInfo->noThrowDivZero, data->localData[0]->timeValue, initial()))

int valid_number(double a)
{
  return !isnan(a) && !isinf(a);
}

static inline modelica_real __OMC_DIV_SIM(threadData_t *threadData, const modelica_real a, const modelica_real b, const char *msg, const int *equationIndexes, modelica_boolean noThrowDivZero, const modelica_real time_, const modelica_boolean initial_)
{
  modelica_real res;
  if(b != 0.0)
    res = a/b;
  else if(initial_ && a == 0.0)
    res = 0.0;
  else
    res = a / division_error_equation_time(threadData, a, b, msg, equationIndexes, time_, noThrowDivZero);

  if(!valid_number(res))
    throwStreamPrintWithEquationIndexes(threadData, equationIndexes, "division leads to inf or nan at time %g, (a=%g) / (b=%g), where divisor b is: %s", time_, a, b, msg);
  return res;
}

I cannot test that code with a different gcc, but I was able to test another component with the same environment, and it passed without any trouble. I am still clueless about what is corrupting the data given to the __OMC_DIV_SIM. Could it be a bug in the compiler ? Or in the generated code?

bl4ckb0ne
  • 1,097
  • 2
  • 15
  • 30
  • 3
    Almost certainly Undefined Behaviour. – wizzwizz4 Apr 24 '18 at 18:08
  • 3
    How do you know that "modifying the code to make it more readable" didn't remove the bug? There are reasons for providing the [Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve) that shows the problem. One is that your code can be compiled and run. Another is that the very act of honing the code down can often reveal the problem anyway. – Weather Vane Apr 24 '18 at 18:14
  • 1
    @WeatherVane I did several tries to come to that answer. I did that because there is several calls to `DIVISION_SIM` in the same function and I wanted to see which one was corrupted. I can put back the generated function if you'd like – bl4ckb0ne Apr 24 '18 at 18:24
  • 1
    @Mulliganaceous the memory allocation in this is handled by [libgc](https://github.com/ivmai/bdwgc) – bl4ckb0ne Apr 24 '18 at 19:22
  • Compile with `-Wcast-align` and run under the undefined behavior sanitizer or `-fsanitize=undefined`. Look for misaligned loads related to your floats and doubles. – jww Apr 24 '18 at 23:31
  • @jww unfortunately I cannot use any sanitizer with tha current setup. – bl4ckb0ne Apr 25 '18 at 00:30
  • Do we know that `modelica_real` is the same thing as `double`? – Arndt Jonasson Apr 25 '18 at 11:43
  • @ArndtJonasson yes, I did a search in the available sources – bl4ckb0ne Apr 25 '18 at 12:56
  • Is there a chance to see the complete code, so it can be tried out? – Arndt Jonasson Apr 25 '18 at 13:58
  • @ArndtJonasson it's a 5Mb archive of raw code, do you still want it ? – bl4ckb0ne Apr 25 '18 at 14:35
  • Not really. Is it in `__OMC_DIV_SIM` that the crash occurs? It seems to just pass `msg` on to two other functions. – Arndt Jonasson Apr 25 '18 at 14:39
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/169770/discussion-between-bl4ckb0ne-and-arndt-jonasson). – bl4ckb0ne Apr 25 '18 at 14:40

0 Answers0