A given C statement is compiled into several machine instructions, and several of them may access memory. Think of something like ptr->fld = arr[i++] * arr[j]--;
.... BTW, in some cases, arr[j]
might have been used earlier, could already sit in some register, so might not need another memory load (but only a store, which could be defered later).
I want to know the location, in executable, of the machine instruction that accesses (heap, global or local) memory generated by the given code
So your question might not make sense in general. Several machine instructions (or none of them) might access memory (related to a single C statement in your source code). And register allocation and register spilling may happen, so a given machine instruction might be related to a C variable quite far from the "current" C instruction (which has no sense).
An optimizing compiler is allowed to mix the several C statements and might output intermixed machine code. Read also about sequence points. There is no obvious mapping between machine code instruction and C statement (notably with optimizations enabled), that is why you often debug with less optimizations enabled (so gcc -g
prefers to be used with -O0
or -Og
, not more).
With GCC compile your src.c
source file using
gcc -O -S -Wall -fverbose-asm src.c
and you'll get a slightly more readable src.s
assembler file. You could use some editor or pager to look into that generated file.
Does anyone know how to get only instructions that access memory?
That does not make much sense. An optimizing compiler would sometimes share some common machine code related to several different C statements.
BTW, you might also ask GCC to dump various internal representations, for example using gcc -O -fdump-tree-all
; then you get hundreds of (textual) internal dump files (partially dumping various internal representations). Remember that GCC has hundreds of optimization passes.
Notice you might be more interested to work on GCC internal representations (e.g. GENERIC or GIMPLE or even RTL) by adding your own GCC plugin (or GCC MELT extensions). That could require months of work (notably to undestand details of GCC internal architecture and representations).
Without understanding your high-level goals and motivations, we cannot help you more.
You should read much more about semantics and about undefined behavior, which is (indirectly) more relevant to your question than what you believe.
Notice that C statements do not correspond (one to many) to machine instructions. An optimizing compiler don't compile C statements one by one, it compiles an entire translation unit at once (and may for example do inline expansions, loop unrolling, stack unwinding, constant folding, register allocation and spilling, interprocedural optimizations and dead code elimination). This is why C compilers are so complex beasts of many millions of source code lines. BTW, most C compilers (e.g. GCC or Clang) are free software, so you can spend several months or years studying their source code.
Read also some good book on compilers (e.g. the latest Dragon Book), some books on semantics, and on programming languages pragmatics.
If you are interested by GCC internals specifically, my documentation page (also available here) of GCC MELT contains lots of slides and references.
If you only care about machine instructions, you might entirely forget about C and work, with the help of some dissassembler library like libopcode
(see this), only on machine code in object files.
Look also into other static source code analyers, including Coccinelle & Frama-C and libclang.
If you are interested only by GCC emitted code and can afford recompiling your C source code, you might instead work inside the GCC compiler (thru your GCC plugin or GCC MELT extension) at the GIMPLE level and detect (and perhaps transform) those GIMPLE instructions accessing memory. Detecting (and perhaps transforming) GIMPLE statements modifying memory could be simpler and might be enough.
I want to test the system by inserting a breakpoint and comparing memory before and after the breakpoint.
This is a bit similar to e.g. address sanitizers and other instrumentation features of GCC. You could spend several years working on something similar (and transforming some GIMPLE), then you probably want to add several additional passes in GCC (and you might need some extra runtime support).
Notice however that recent GDB is scriptable (in Guile or Python) and has watchpoints. If you just want to debug one particular program, that might be enough (and you might not need to dive into compiler internals, which would take many months or years of work). You should also use valgrind and address sanitizers.