2

Just for kicks, I'n trying to create a .dylib that intercepts malloc() calls. I wanted to print out allocations sizes and caller addresses for later munging. The output format is like this:

0x7fff93b8ffa7: 20 bytes
0x7fff69eaec18: 16 bytes
0x000100a2d45b: 8 bytes
0x000100a2d45b: 8 bytes
0x000100a2d45b: 8 bytes

Which is a result of running any program like this:

DYLD_FORCE_FLAT_NAMESPACE=1 DYLD_INSERT_LIBRARIES=debugmalloc.dylib mybinary 2>debug.log

mybinary can be replaced by any other program. debugmalloc.dylib is the malloc() interceptor.

Now I can start creating histograms and calculating waste! There's just one thing missing though: I'd like to see the names (symbols) of the actual (public) callers instead of just the addresses. So I start looking around and find some interesting pieces of the puzzle.

  • Using nm: Offset in nm symbol value?
  • Using lldb to symbolicate with image lookup -address: http://lldb.llvm.org/symbolication.html (using otool -l $BINARY to find the addr of the __TEXT.__text fragment and trying if offsetting with that works). I was hoping this would be even better because lldb has debugging information and thus I would be able to symbolicate internal symbols as well.

But to date I haven't been able to work back to the file address of the symbol. lldb's image lookup -address stays silent. My next step is probably to look at the otool -l output some more to see if I can square it with some of the addresses I'm seeing, but that will be tedious manual work. If someone has a better idea of how to do this, I'm all ears.

Community
  • 1
  • 1
Aktau
  • 1,847
  • 21
  • 30
  • Most linkers have the ability to output a symbol map, which a program can parse into a table and use that to create a histogram. For some embedded projects, I've use a timed interrupt that was independent of the operating system timer in order to sample where the PC was and in turn which function was active at the time of the sample in order to generate histograms to see which functions took the most time. – rcgldr Dec 07 '14 at 23:40
  • You may already be aware, but the standard malloc library can already log malloc history. See the man page or do `MallocHelp=1 /usr/bin/true`. As for converting addresses to symbols, use [`atos`](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/atos.1.html). It's best to apply it against a running process rather than just an executable file, since the dynamic loader can slide images from their "standard" location. At run time, you could also use [`dladdr()`](https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/dladdr.3.html). – Ken Thomases Dec 08 '14 at 00:04
  • @rcgldr: I'm aware of the coolness of stack-based sampling, thanks :). My current goal is a bit more limited though: address to symbol resolving after the executable has ended. – Aktau Dec 08 '14 at 08:38
  • @KenThomases: I was indeed aware of this, my interest is a tad bit more academic though, thanks! I will most definitely look to see if I can use either atos or dladdr to solve my issues (I think one of the main ones being to find the offset at which the __TEXT.__text section is loaded when the binary was run. – Aktau Dec 08 '14 at 08:40
  • You can use `_dyld_get_image_vmaddr_slide()` to get the slide of each image. I'm not sure if image 0 is always the executable. You may need to use `_dyld_get_image_name()` to check. You can log that slide when your malloc library is initialized and then use it when running `atos` later. – Ken Thomases Dec 08 '14 at 16:20

1 Answers1

3

"image lookup" on an address will show the file address:

(lldb) image lookup -a 0x7fff89e6752e
      Address: libsystem_kernel.dylib[0x000000000001152e] (libsystem_kernel.dylib.__TEXT.__text + 68126)
      Summary: libsystem_kernel.dylib`mach_msg_trap + 10

The number in square brackets is the file address. Of course, not all the functions in system libraries are going to have symbols, since many of them are stripped. lldb reads the "Function Starts" section in the library (if it exists), and composes fake symbols for these, so we won't attribute addresses to the wrong symbol in general, but the names won't be very helpful.

You can also get the file address on the SB API side by making an SBAddress with the load address and the target, and then reading the file address from the SBAddress:

(lldb) script
Python Interactive Interpreter. To exit, type 'quit()', 'exit()' or Ctrl-D.
>>> addr = lldb.SBAddress(0x7fff89e6752e, lldb.target)
>>> print "0x%x"%(addr.file_addr)
0x1152e
Jim Ingham
  • 25,260
  • 2
  • 55
  • 63
  • I probably explained myself badly. In the last part of my answer I state that I tried to use lldb to symbolicate with `image lookup -a` already, and it didn't even respond. So I assumed that I need to use some sort of offset to find the "real" file address. When you say "address" in your first sentence, you mean the address I record from the debug malloc? Because that doesn't seem to work. Unless I'm loading lldb wrong, do you use any special commands to fire it up? – Aktau Dec 09 '14 at 22:36
  • Ah, I didn't see you were doing this after the fact, my bad. You will need to record not just addresses, but addresses + containing shared library + library offset. dladdr will tell you these things. Then you can run lldb and do "target create -d " and use that to look up the unslid address. The -d is important since all the dependencies will overlay each other and you'll get bogus matches. – Jim Ingham Dec 09 '14 at 23:36
  • Another approach is to record the shared library state (including load addresses of all the libraries) at the time you took the addresses, and use lldb to build up a replica of the program in that state. Then you can use "image lookup", disassemble, etc on the bare addresses. The "symbolicate.py" module in lldb's Python has some classes to help with this, and crashlog.py is an example of doing this by reading the library list from the Crash Log. – Jim Ingham Dec 09 '14 at 23:39
  • Note that your job is harder, since libraries come & go, so you'll have to keep some library generation index, and record that along with your addresses. – Jim Ingham Dec 09 '14 at 23:40