Is G-WAN supposed to work with valgrind?
We have tested Valgrind and while it does many things right, it is just not suitable for high-concurrency jobs (even low-concurrency is a problem with Valgrind).
viable options to detect memory bugs in C code running under G-WAN?
Use malloc()
wrappers, pre-allocated pools, or even better, use alloca()
to avoid memory issues in the first place.
Note that G-WAN handles bad pointers in C scripts without crashing the server, see: http://gwan.ch/developers#crash
This buggy code:
int main(int argc, char *argv[])
{
strcpy(0xBADC0DE, 0xBADC0DE);
return 200;
}
...will produce something like the following 'graceful' crash report:
Script: crash_libc.c
Client: 127.0.0.1
Query : ?crash_libc
Signal : 11:Address not mapped to object
Signal src : 1:SEGV_MAPERR
errno : 0
Thread : 0
Code Pointer: 0000f5200b33 (module:/lib/libc.so.6, function:strcpy, line:0)
Access Address: 00000badc0de
Registers : EAX=00000badc0de CS=00000033 EIP=0000f5200b33 EFLGS=000000010202
EBX=000000000001 SS=ec2d8ed4 ESP=0000f5ded828 EBP=0000f5dee020
ECX=000033323130 DS=ec2d8ed4 ESI=0000ec2d8f86 FS=00000033
EDX=000003b03c00 ES=ec2d8ed4 EDI=00000badc0de CS=00000033
Module :Function :Line # PgrmCntr(EIP) RetAddress FramePtr(EBP)
libc.so.6: strcpy: - 0000f5200b33 0000ec2d8f00 0000f5dee020
servlet: main: 37 0000ec2d8f00 00000042e10c 0000f5dee020
And G-WAN goes as far as to tell you where the bug happened in your source code (see the G-WAN crash_xxx.c examples) instead of killing the server process.
If you don't want to debug C code, then use Java or Scala (both supported by G-WAN) - you will need much more memory because your data will remain loaded until the GC slows-down everything to free what it thinks can be freed - but at least you will enjoy fewer memory-related bugs, if any.
Per the request of the person asking the question, here are more details.
In late 2012, we have tested a dozen of free and commercial tools which, like Valgrind, are supposed to help debugging concurrency. We also used static tools studying source code, and not only dynamic tools working on running (compiled) programs.
The sad truth is that they all suffer from common problems, they:
- are generally too slow to support concurrency (the core issue)
- produce gazillions of trivial alerts (and even more false alerts)
- are very expensive (that's or the commercial ones of course) and cannot always be tested before buying(!)
So, after weeks checking and filtering all those results, we have spent a lot of time "correcting" the G-WAN codebase to remove the trivial and false alerts (alerts caused by tools that can't distinguish valid code from buggy code)... but, to our dismay at the time, we haven't found any real bug in G-WAN (making it clear that those weeks were wasted time).
Hence the conclusion above: try to make simple code when possible, and try to pre-allocate blocks when more sophisticated strategies are needed.
Of course, the fact that the Linux LIBC insists to kill applications with (non-catchable) abort
signals does not help (this prevents the program from recovering or from dumping a relevant trace), especially for the sloppy double-free Linux LIBC detection (which wrongly assumes that all the code is using its malloc() when a program has used malloc() once - which is often done by LIBC calls!). And I am not even talking about mmap() failures nor about the OOM kill-switch.
The only solution that we have found working so far is to avoid using the Linux LIBC, and to compile everything we need with our own C runtime. This is a bit difficult to recommend as "the thing to do" for all users, but it worked for us.
We would be very happy to see portions of our code (or at least some of the concepts implemented in G-WAN) used by Linux, as this would make our life (and the one of many other developers) immensely easier, but the contacts that we have had in the past with "the people in charge" were not encouraging.
All in all, there's room for improvements, from the OS, from ISVs like us, and from developers - after all, concurrency is "only" mainstream since 2004... almost ten years ago.