1

:) I'm trying to port some legacy code (large program) to CentOS 7 but I'm hitting a snag. The core of the code is a rather awkard structure built around using mmap to allocate a hard-coded address and map a file to it. The file acts like a database (and is built by one) and includes hard-coded pointers to different sections of the mapped memory. Very ugly, but it is what it is. The entire program is built around this structure, and nobody is going to fund a rewrite.

The problem comes on the mmap line. This worked before, but no longer on CentOS 7:

mmapAddr = mmap ((void *) SMAddr, SMA_WINDOW_SIZE, PROT_READ | (readOnly ? 0 : PROT_WRITE),MAP_FILE | MAP_FIXED | MAP_SHARED, SMFileDesc, 0);

... where SMAddr is 0x8000000, SMA_WINDOW_SIZE is 127926272, and readOnly is false. So basically it's trying to map a file to the address 0x8000000 with size 122MB.

What might have changed between versions, I have no clue. But I do note that the file it's mapping is only 1,5MB. I'm not sure exactly why it needs to map so much more than the file size, but I know it's needed, and I know that a lot of nuance has gone into picking the size "122MB" for some reason.

Could a mismatch between actual file size and allocated size have been fine in the past but not any more? I know that SIGBUS means an attempt to access an invalid memory region. Given that mmap doesn't take any sort of allocated pointer, this has to be something it's doing internally.

I tried catching and blocking SIGBUS (thinking that maybe it'd be ignorable?), but the program still crashed with a SIGBUS at the same spot. Maybe I did that wrong.

Thoughts?

Jean-Baptiste Yunès
  • 34,548
  • 4
  • 48
  • 69
KarenRei
  • 589
  • 6
  • 13

1 Answers1

3

From here1:

The mmap() function can be used to map a region of memory that is larger than the current size of the object. Memory access within the mapping but beyond the current end of the underlying objects may result in SIGBUS signals being sent to the process. The reason for this is that the size of the object can be manipulated by other processes and can change at any moment. The implementation should tell the application that a memory reference is outside the object where this can be detected; otherwise, written data may be lost and read data may not reflect actual data in the object.

Note that references beyond the end of the object do not extend the object as the new end cannot be determined precisely by most virtual memory hardware. Instead, the size can be directly manipulated by ftruncate().

So most likely the bug is that your program tries to access a region of the mapped memory which lies outside the file. The mmap call should succeed, however. Which return value do you get?

Alexander Torstling
  • 18,552
  • 7
  • 62
  • 74
  • The value I get is a crash - or more accurately, SIGBUS. :( And I agree, the mmap call should succeed. But it does not. – KarenRei Aug 16 '16 at 07:48
  • I should add that I've surrounded that statement by flushed debugging. Only the pre-mmap line gets reached. GDB drops out at the mmap statement: – KarenRei Aug 16 '16 at 07:56
  • Program received signal SIGBUS, Bus error. 0xb7739424 in __kernel_vsyscall () (gdb) #0 0xb7739424 in __kernel_vsyscall () #1 0xb7613638 in mmap () at ../sysdeps/unix/sysv/linux/i386/mmap.S:56 #2 0x080a563f in fdps_shmat (SMFileDesc=8, SMAddr=0xb0000000, SMFlag=2048) at fdps_shm.c:284 #3 0x08063339 in ET_open (p_fname=0xbf896939 "/home/fdps/fdps/dat/local/cassrw.dat", p_flag=2) at et_open.c:260 #4 0x0806c294 in DB_open (p_fiid=0xbf896939 "/home/fdps/fdps/dat/local/cassrw.dat", p_flag=2, p_seg=0xb0000000 "FDPS\005") at db_open.c:104 ... – KarenRei Aug 16 '16 at 07:57
  • 0x8000000 + 127926272 = 0xb7a00000, compare with "0xb7739424 in __kernel_vsyscall () ". You have overwritten part of your address space. –  Aug 16 '16 at 09:39
  • I can *try* reducing the size of the allocation, but I know that a lot of work was put into choosing that size (again, I don't know why. But by – KarenRei Aug 16 '16 at 09:46
  • You can't "try reducing the size" and see if anything blows up. You have to investigate. Are there no senior engineers working in your company? –  Aug 16 '16 at 10:36
  • I reduced it to 118MB... but now the program is crashing on the immediate next line when I try to write out my next debugging statement (exact same debugging code as used elsewhere): – KarenRei Aug 16 '16 at 10:42
  • Program received signal SIGBUS, Bus error. 0xb75b5210 in malloc@plt () from /lib/libc.so.6 (gdb) #0 0xb75b5210 in malloc@plt () from /lib/libc.so.6 #1 0xb760227c in __fopen_internal (filename=filename@entry=0x80c4678 "/tmp/test2", mode=mode@entry=0x80c4676 "a", is32=is32@entry=1) at iofopen.c:73 #2 0xb760235b in _IO_new_fopen (filename=0x80c4678 "/tmp/test2", mode=0x80c4676 "a") at iofopen.c:103 #3 0x080a564a in fdps_shmat (SMFileDesc=8, SMAddr=0xb0000000, SMFlag=2048) at fdps_shm.c:286 #4 0x08063339 in ET_open (p_fname=0xbfa62939 "/home/fdps/fdps/dat/local/cassrw.dat", p_f... – KarenRei Aug 16 '16 at 10:44
  • Re: things blowing up: AFAIK, the needed size was determined by a "try and see what blows up" approach; the code is littered with past attempts :Þ I know, it's a mess... Will try further space reductions. – KarenRei Aug 16 '16 at 10:45
  • You could try to reserve that memory range when linking. See http://stackoverflow.com/questions/24118682/how-to-reserve-a-range-of-memory-in-data-section-ram-and-the-prevent-heap-stac – Alexander Torstling Aug 16 '16 at 14:08
  • Simple example: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/4/html/Using_ld_the_GNU_Linker/simple-example.html – Alexander Torstling Aug 16 '16 at 14:10
  • I think I understand now what you are trying to do: mmap the data segment of the executable to acheive persistence. You can then call sbrk(0) to get the end of the data segment. This is where your mapping should end. – Alexander Torstling Aug 16 '16 at 14:18