4

I've taken a good time studying TOC and Compiler design, not done yet but I feel comfortable with the conceptions. On the other hand I have a very shallow knowledge of assembly and machine code, and I have always the desire/need to connect the two sides( HLL and LLL representation of the code ), as I'm learning C++ with paying great attention to performance and optimization discussions.

C++ is a statically typed language:

My question is: Our variables when written as expressions in the statements of the code, do all these variables ( and other entities with identifiers ) become at runtime, mere instructions of addressing to positions of the virtual memory ( for static and for globals ) and addressing relevant to stack address for local variables?

I mean, after a successful compilation including semantic and syntactic verification, isn't wise to deal with data at runtime as guaranteed entities of target memory bytes without any thinking of any identifier or any checking, with the symbol table no more needed?

If my question appeared to be the type of questions that are due to lacking of learning effort ( which I hope it doesn't ), please just inform me about that, and tell me where to read. If that was the case, then it's honestly because I'm concentrating on C++ nowadays and haven't got the chance yet to have a sound knowledge of low level languages, I apologize for that in advance.

Physician
  • 483
  • 2
  • 7
  • 4
    That is basically how statically compiled languages without introspection or reflection works. When you compile a C++ source file, the object file generated by the compiler have no reference to the variables in the source. It's all memory locations. – Some programmer dude Jul 13 '16 at 13:01
  • 2
    Yes, your understanding is correct. – Jonathan Wakely Jul 13 '16 at 13:02
  • TLDR; and confusing. But `C++ is a statically typed language:` is wrong. It includes C. –  Jul 13 '16 at 13:05
  • 2
    @DieterLücking, C++ does not include the C language. Specifically, implicit conversions between enumeration types are forbidden, as are implicit conversions between different pointer types. – Jonathan Wakely Jul 13 '16 at 13:08
  • @DieterLücking: In what way is C not statically typed? – Benjamin Lindley Jul 13 '16 at 13:09
  • @BenjaminLindley Simple example: printf –  Jul 13 '16 at 13:13
  • 1
    @DieterLücking using a statically typed language to build dynamically typed behaviour does not mean the language is not statically typed – underscore_d Jul 13 '16 at 13:17
  • 1
    @DieterLücking: I have no idea how printf is an example that demonstrates that C is not statically typed. Maybe we are using different definitions. – Benjamin Lindley Jul 13 '16 at 13:18
  • @JonathanWakely your argumentation is reverse. –  Jul 13 '16 at 13:19
  • 1
    @BenjaminLindley While I think C is indeed a statically typed language, there are cases where a procedure can expect a type `X`, but the user supplies an object of type `Y` and the error is not found by the type system at compile time (ie: `printf`, as said above). This is because C does not have templates (or variadic templates, in the case of `printf`), and therefore some code needs type erasure. So while both C and C++ are statically typed languages, C requires you to break from the static typing sometimes. – KABoissonneault Jul 13 '16 at 13:20
  • @DieterLücking, which "argumentation is reverse"? – Jonathan Wakely Jul 13 '16 at 13:49

2 Answers2

3

You're spot on. Once compiled to machine code, there is no longer any notion of a variable identifier (or variable type, for that matter). It's just bytes at a certain location. Which location was determined by the compiler (when compiling) based on the variable name, or by the linker (when linking) in the case of global variables.

Of course, it can be useful to retain information such as identifiers, for debugging purposes. This is precisely what "compilation with debug information" means: when you do that, the compiler will somehow embed the (redundant) identifiers into the generated code such that a debugger can access them. Or put them in a separate file alongside; the details of that depend on the format of the debugging information.

Angew is no longer proud of SO
  • 167,307
  • 17
  • 350
  • 455
2

Yes, mostly. There are a few details that will make identifiers remain more than just addresses or stack offsets.

First we have in RTTI in C++ which means that during runtime the name of at least types may still be available. For example:

const std::type_info &info = typeid(*ptr_interface);
std::cout << info.name() << std::endl;

would print the name of whatever type *ptr_interface is of.

Second, due to the way a program is linked the symbols from the object files may still be present in the executing image. You have for example the linux kernel making use of this as it can produce a backtrace of the stack including the function names. Also it uses knowledge of function names in order to be able to load and link modules. Similar functionality exists in Gnu C library, than when linked for it is able to retrieve function names in stack traces.

In normal cases though the code will not be affected by the original names of the variables (but the compiler will of course emit code suitable for the type the variable have).

skyking
  • 13,817
  • 1
  • 35
  • 57
  • Thank you for clarification. I think for, let's say, simple variables accessing l-valued or r-valued in ordinary code x = y + 1 ; , the variable accessing expression will go as an instruction of addressing unchecked for neither identifier nor type at runtime. Is it accurate to assume this? – Physician Jul 13 '16 at 13:26
  • 1
    Do note that RTTI is a feature that can be turned on and of by the compiler and it adds extra overhead to the program. – NathanOliver Jul 13 '16 at 13:26
  • 1
    @Physician well, the machine code itself will use appropriately wide registers/alignment to access the memory allocated for the variable, but it doesn't innately care what the C++ code looked like, if that's ultimately your question. and it certainly doesn't care what variables were named. – underscore_d Jul 13 '16 at 13:28
  • No, my question is actually about the ability of total getting rid of the symbol table for simple l-valued or r-valued variable expressions, such as this very statement: int x = 2, y=3; or this: x = y + 1. Regardless of whether there's still additional use of symbol table when intentionally wanted. That's because I can't help but feel that accessing symbol table at each variable accessing is a bad idea that seems very easily avoidable. – Physician Jul 13 '16 at 13:37
  • 1
    @Physician You seem to be confusing some things, although it's hard to tell what exactly -- the only time an expression like "int x = y+1" would go through any kind of table is if y is defined as extern variable, e.g. in a shared library. Otherwise, the generated code will likely just be something like "add %rax, $1". I encourage you to compile some simple statements at look at the generated assembler code. – Benno Jul 13 '16 at 16:39
  • 1
    @Physician It is a bad idea, which is why it doesn't happen. I'm not sure why you feel all this vague concerned pondering is necessary when, as Benno said, you could simply compile a program (with progressive levels of optimisation) and see for yourself what typical C++ implementations do. – underscore_d Jul 13 '16 at 17:07
  • yeah honestly because I don't know how to see a compiled program. I'll check that out, thanks indeed. – Physician Jul 13 '16 at 20:20