3

I had a function in C which I used to extend python, previously using the BOOST_MODULE function to accomplish this. This error came up when transitioning to the python-C API. I am certain that the run_mymodule function runs fine without this wrapper.

static PyObject * wrap_run_mymodule(PyObject *, PyObject *args) {
    char *file1, *file2, *file3;
    PyObject *tmpp;
    if(!PyArg_ParseTuple(args, "sssO", &file1, &file2, &file3, &tmpp))
        return NULL;
    return Py_BuildValue("i", run_mymodule(file1, file2, file3, tmpp));
}

static PyMethodDef myModule_methods[] = {
    {"run_mymodule", (PyCFunction) wrap_run_mymodule, METH_VARARGS},
    {NULL, NULL}
};

extern "C" void initmymodule(void)
{
    (void) Py_InitModule("mymodule", myModule_methods);
}

the declaration of the function is of this form: int run_mymodule(char *file1, char *file2, char *file3, PyObject *tmpp)

Here is the exact error message I get:

 python(35137,0x7fff76453310) malloc: *** error for object 0x10afcfb78: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug
 Abort trap: 6

How can I solve this problem? Where is this malloc error coming from? In python, I am passing strings as the first three arguments, and a python class as the fourth argument. Of course, I am happy to put probes into my code.

SanderMertens suggested I post the valgrind output-

$ valgrind python test_mymodule.py
==30715== Memcheck, a memory error detector
==30715== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==30715== Using Valgrind-3.10.1 and LibVEX; rerun with -h for copyright info
==30715== Command: python test_mymodule.py
==30715== 
==30715== Syscall param posix_spawn(pid) points to unaddressable byte(s)
==30715==    at 0x3D266E: __posix_spawn (in /usr/lib/system/libsystem_kernel.dylib)
==30715==    by 0x100001DC2: ??? (in /usr/local/bin/python)
==30715==    by 0x25E5FC: start (in /usr/lib/system/libdyld.dylib)
==30715==    by 0x1: ???
==30715==    by 0x1000138CF: ???
==30715==    by 0x104803AD1: ???
==30715==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==30715== 
Python(30715,0x7fff74a8e310) malloc: *** mach_vm_map(size=140735173898240) failed (error code=3)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
libc++abi.dylib: terminating with uncaught exception of type std::bad_alloc: std::bad_alloc
 Abort trap: 6
kilojoules
  • 9,768
  • 18
  • 77
  • 149
  • 2
    This suggests you are corrupting the stack (probably a buffer overflow) and adding the print statement alters the stack layout. Use a debugger to find out why `bad_alloc` gets thrown, it's probably because something tries to allocate a huge chunk of memory, because the number of bytes to allocate has been corrupted. Find where that value is, then find where it's corrupted. Fix that. – Jonathan Wakely Jun 26 '15 at 15:36
  • This strange behavior was encountered when using the `PyArg_ParseTuple` function. – kilojoules Jun 26 '15 at 15:36
  • Please [edit](http://stackoverflow.com/posts/31077040/edit) your question with a [SSCCE](http://sscce.org). – NathanOliver Jun 26 '15 at 15:37
  • what does `ba_exception.what()` returns? Are all these objects initialized to some default values? You can check `bad_alloc` exception as follows: `int main () { try { int* a= new int[10]; } catch (std::bad_alloc& ba_exception) { std::cerr << "bad_alloc caught: " << ba_exception.what() << '\n'; } return 0; }` – JamesWebbTelescopeAlien Jun 26 '15 at 15:38
  • 3
    @NulledPointer, typically bad_alloc::what() just returns a string like" bad_alloc" which doesn't tell you anything. It's more useful to inspect where it gets thrown from and see _why_ it's thrown. – Jonathan Wakely Jun 26 '15 at 15:39
  • @NathanOliver I have added a SSCCE to the best of my ability – kilojoules Jun 26 '15 at 15:47
  • 1
    @kilojoules It appears from `mach_vm_map(size=2954545942545833984)` that you are assigning too huge of a memory. You seem to have memory leaks OR your function is trying to `malloc` memory whose size is determined at runtime (are you parsing the size to be allocated somewhere?) – JamesWebbTelescopeAlien Jun 26 '15 at 17:24
  • @NulledPointer the `tmpp` variable is a python module, so I guess it is dynamically allocated. – kilojoules Jun 26 '15 at 17:35
  • The bad allocation is right, as @NulledPointer said. But I think it is due to `PyArg_ParseTuple()` going off the deep end, corrupting memory, and inadvertantly leading to that crazy alloc attempt. I say that because the `std::cout` is having an effect. `PyArg_ParseTuple()` takes a variable argument list, dependent on `args`. It may be accidentally using the return address that's on the stack as a pointer to storage where it writes or reads. I'd take a closer look at `args`. – donjuedo Jun 26 '15 at 22:11
  • What do you mean with "that a simpler cout statement still fixes the error"? – tynn Jun 30 '15 at 07:27
  • @tynn previously my cout statement was `std::cout << "dumping values " << &file1 << &file2 << &file3 << "\n" << &var_which_is_pyobject <<"\n";`, and I changed it to `std::cout << "probe\n";`. In both cases, the malloc error disappears. – kilojoules Jun 30 '15 at 22:20
  • Upon further investigation, adding a `std::cout` statement reduces the frequency of this error but does not eliminate it entirely – kilojoules Jul 07 '15 at 16:10
  • You might want to try running it in valgrind. It will most likely tell you where & when memory corruption started. Just prefix your command with 'valgrind' (so like 'valgrind python – Sander Mertens Jul 07 '15 at 21:34
  • @SanderMertens I added the output of valgrind to my post – kilojoules Jul 07 '15 at 22:09
  • 1
    The posted code looks fine, although `PyArg_ParseTuple()`'s `s` format unit is associated with `const char*`, not `char*`. I was not able to reproduce the problem with this [example](http://coliru.stacked-crooked.com/a/c470b999b6d0aada). Are you able to reproduce the problem when `run_mymodule()` only returns a number and `mymodule.run_mymodule()` is invoked from a simple script? – Tanner Sansbury Jul 08 '15 at 18:44
  • @TannerSansbury Could you try running the example with a python module being passed instead of the integer 1? – kilojoules Jul 09 '15 at 23:08
  • If the error states `pointer being freed was not allocated` and in your [other question](http://stackoverflow.com/questions/31329062/error-value-of-type-pyobject-aka-object-is-not-contextually-convertible) you use `PyObject tmp` in a function signature, I'd assume there's a correlation. – tynn Jul 09 '15 at 23:24
  • You are corrupting a memory block. Adding cout statement simply changes the stack layout which "hides" the bug. Changing the compilation options would have a similar effect (e.g., in debug there is more info layed down on the stack and that also can 'hide' the bug). Seeing the source code for run_mymodule would help to find where the corruption occurs. Trying debugging dichotomically, removing code until you have the smallest fragment that still displays the bug. Then share that. – Laurent Michel Jul 10 '15 at 00:09
  • @tynn Can you expand on your comment? What do you mean? – kilojoules Jul 10 '15 at 01:26
  • It's late for me, so I'll try. Usually you'll only have `PyObject *` pointer in your programm. These are allocated on the heap and freed when the refcount is 0. Now you've got some `int func(PyObject tmp)`. The parameter `tmp` is on the stack and more or less a copy of some other object. If somehow the refcount becomes 0, the object gets deallocated thus freed. But since it's on the stack, the memory was never allocated by malloc. Try `void t(PyObject o){while(1)Py_DECREF(&o);}` on some object and see if it's the same error like yours. I've got a `double free or corruption` error on Linux. – tynn Jul 10 '15 at 01:47
  • 1
    @kilojoules In the minimal example I posted, the 4th argument's type has no affect on the program (see [here](http://coliru.stacked-crooked.com/a/5d1ed82e3fd69889)). The problem likely exists in either c `run_mymodule()` function or calls leading up to the invocation of `mymodule.run_mymodule()`. Can you please reproduce the problem in a [mcve](http://stackoverflow.com/help/mcve)? The lack of a reproducible example is just resulting in speculation. If you cannot produce a mcve, can you verify if the the problem exists when `run_module()` only returns a number, similar to the example? – Tanner Sansbury Jul 10 '15 at 04:45
  • It may also be helpful to publish the source of run_module() if it isn't too big. – Peter Brittain Jul 12 '15 at 09:35

1 Answers1

1

As has been explained in several of the comments now, you have a memory corruption issue. Key indicators are:

  • The problem reproduces erratically.
  • Exact symptoms change from unexpectedly bad values being passed into some functions, through to bizarrely large memory allocation failures.
  • These symptoms don't reproduce for others when run_module has a trivial implementation.

Given that valgrind isn't pinpointing it for you, it's probably a stack corruption. You could look for places where your code is calling the failed function (from valgrind) and look at what your code did before that, but this will be slow.

Your best bet at this stage is now to use the stack validation tools out there. For example, assuming you're using gcc, try the address sanitizer or mudflap features (depending on which version you're using).

Community
  • 1
  • 1
Peter Brittain
  • 13,489
  • 3
  • 41
  • 57
  • The `-fsanitize` and `-fmudflap` flags weren't recognized by clang-3.6. `-fstack-protector` didn't change my output. – kilojoules Jul 09 '15 at 22:48
  • @kilojoules: I've not used clang, but according to their [3.6.0 user guide](http://llvm.org/releases/3.6.0/tools/clang/docs/UsersManual.html#controlling-code-generation) it should recognize `-fsanitize=address`. Is this the compiler you're using? – Peter Brittain Jul 09 '15 at 22:57
  • Yes I'm using clang-3.6 from brew's llvm. – kilojoules Jul 10 '15 at 01:26
  • @kilojoules So what error do you get when you follow their instructions for [address sanitizer](http://clang.llvm.org/docs/AddressSanitizer.html)? – Peter Brittain Jul 10 '15 at 06:56
  • It seems like brew's llvm clang can't handle the `-fsanitize=address` option. `Symbol not found: ___asan_option_detect_stack_use_after_return`. – kilojoules Jul 13 '15 at 16:05
  • @kilojoules Possibly a link error? Looks like other people have hit similar issues [here](https://github.com/xiaoyur347/address-sanitizer/issues/380) and [here](https://github.com/rocker-org/rocker/issues/125) that might help. – Peter Brittain Jul 13 '15 at 17:41
  • For the record, osx clang simply does not support any `fsanitize` arguments as far as I can tell. Brew's llvm has not been fruitful in this matter either. – kilojoules Jul 20 '15 at 05:49
  • @kilojoules That's surprising given their documentation, but I don't use OSX, so can't help track down why. Sorry. Just one final thought: did you see the bit on using `DYLD_INSERT_LIBRARIES` for some versions of OSX? – Peter Brittain Jul 20 '15 at 09:07