What is the address of a function in a C++ program?

Question

As the function is set of instruction stored in one contiguous block of memory.

And address of a function (entry point) is the address of the first instruction in the function. (from my knowledge)

And thus we can say that the address of function and the address of the first instruction in the function will be the same (In this case the first instruction is the initialization of a variable.).

But the program below contradicts the above line.

code:

#include<iostream>
#include<stdio.h>
#include<string.h>
using namespace std;
char ** fun()
{
    static char * z = (char*)"Merry Christmas :)";
    return &z;
}
int main()
{
    char ** ptr = NULL;

    char ** (*fun_ptr)(); //declaration of pointer to the function
    fun_ptr = &fun;

    ptr = fun();

    printf("\n %s \n Address of function = [%p]", *ptr, fun_ptr);
    printf("\n Address of first variable created in fun() = [%p]", (void*)ptr);
    cout<<endl;
    return 0;
}

One output example is:

 Merry Christmas :) 
 Address of function = [0x400816]
 Address of first variable created in fun() = [0x600e10]

So, here the address of function and the address of first variable in function is not same. Why so?

I searched on google but can't come up with the exact required answer and being new to this language I exactly can't catch some of contents on net.

" here the address of function and the address of first variable in function is not same. Why so?" - Why should it be the same? It is not an instruction — Ed Heal, Dec 25 '15 at 14:24
The variables in the function are not always stored with the function. The variables are stored somewhere else, like on the `stack`. That way the function can utilize the push & pop operations. — Greg M, Dec 25 '15 at 14:32
@GregM In a book it's written that a function doesn't have is own address but it's address is just the address of the beginning of the stack in the memory and as far as I know that the beginning of the stack starts with the initialization of variable (or instructions ). — Amit Upadhyay, Dec 25 '15 at 14:36
@AmitUpadhyay you can use this website to look at assembly code: https://assembly.ynh.io/ — Greg M, Dec 25 '15 at 14:45
@AmitUpadhyay Either that book you mentioned in an above comment is horribly wrong, or you misunderstood what it said. What you claim that book said is garbage. — JSF, Dec 25 '15 at 15:04
@AmitUpadhyay - there's a stack pointer and there's an instruction pointer. Both are different things. — Danny_ds, Dec 25 '15 at 15:33
I think the address of the function is the address of the label that defines the function. — Brandon, Dec 25 '15 at 19:27
Not related to the question but I think you should use `const char*` instead of casting that literal to `char*`. — Marian Spanik, Dec 26 '15 at 11:28
@MarianSpanik That gives `error: invalid conversion from ‘const char**’ to ‘char**’ [-fpermissive]` — Amit Upadhyay, Dec 26 '15 at 12:14
@AmitUpadhyay: Yes, because you should _also_ change the `char**` to `const char**`. Basic `const`-correctness. — Lightness Races in Orbit, Dec 26 '15 at 21:23
Fun fact: a C "function pointer" is *not* required to be the address of the machine code of a function, and there have been at least two production ABIs (IBM's POWER, and IA64) where it was the address of a data object! — zwol, Dec 27 '15 at 06:16
*"And thus we can say that the address of function and the address of the first instruction in the function"* No. First off, the address doesn't have to be the code (see zwol's point). Second, C++ (and C) functions usually have a setup instruction or two before the first instruction that the function's (visible) code generates. — T.J. Crowder, Dec 27 '15 at 10:19

score 57 · Accepted Answer · edited Dec 26 '15 at 17:57

57

So, here the address of function and the address of first variable in function is not same. Why so?

Why it would be so? A function pointer is a pointer that points to the function. It does not point to the first variable inside the function, anyway.

To elaborate, a function (or subroutine) is a collection of instructions (including variable definition and different statements/ operations) that performs a specific job, mostly multiple times, as required. It is not just a pointer to the elements present inside the function.

The variables, defined inside the function are not stored in the same memory area as that of the executable machine code. Based on the storage type, the variables which are present inside the function are located in some other part of the memory of the executing program.

When a program is built (compiled into an object file), different part of the program gets organized in a different manner.

Usually, the function (executable code), resides in a separate segment called code segment, usually a read-only memory location.
The compile time allocated variable, OTOH, are stored into the data segment.
The function local variables, usually are populated onto the stack memory, as and when needed.

So, there is no such relation that a function pointer will yield the address of the first variable present in the function, as seen in the source code.

In this regard, to quote the wiki article,

Instead of referring to data values, a function pointer points to executable code within memory.

So, TL;DR, the address of a function is a memory location inside the code (text) segment where the executable instructions reside.

edited Dec 26 '15 at 17:57

hyde

60,639
21
115
176

answered Dec 25 '15 at 14:25

Sourav Ghosh

133,132
16
183
261

okay, I read somewhere in a book that a function doesn't have is own address but it's address is just the address of the beginning of the stack in the memory and as far as I know that the beginning of the stack starts with the initialization of variable (or instructions ). – Amit Upadhyay Dec 25 '15 at 14:31
@AmitUpadhyay I really don't understand what you're talking about. What stack? What beginning? – Sourav Ghosh Dec 25 '15 at 14:33
@Amit Consider a recursion. You need the same instructions, but also multiple stack frames to hold the local data. – LogicStuff Dec 25 '15 at 14:39
@AmitUpadhyay: I think you've got that backwards. On most architectures, the first value of the stack-frame is the return address for the current call _(which I guess you could consider a function pointer, though it technically has no high-level equivalent in C++)_, not the other way around. – BlueRaja - Danny Pflughoeft Dec 25 '15 at 21:43
*"The variables, defined inside the function are not necessarily stored in the same memory address as that of the executable part."*, umm, just what you mean by "executable code is not necessarily in the same memory address as variables"? – hyde Dec 26 '15 at 15:44
@hyde It is regarding to the data and code segment part. I tried to clarify but if you feel it can be improved by using any alternative wordings, please feel free to edit. Thanks. :) – Sourav Ghosh Dec 26 '15 at 15:46
@SouravGhosh Edited. Roll back or edit further if you disagree. – hyde Dec 26 '15 at 17:58
Dear Downvoter, your silence will not help to improve the quality of the post, anyway. If you have some suggestion, please let me know so that the post can be improved. Thanks. :) – Sourav Ghosh Jan 18 '16 at 09:35

score 17 · Answer 2 · answered Dec 25 '15 at 19:02

17

A function's address is only a symbolic way to hand this function around, like pass it in a call or such. Potentially, the value you get for the address of a function is not even a pointer to memory.

Functions' addresses are good for exactly two things:

to compare for equality p==q, and
to dereference and call (*p)()

Anything else you try to do is undefined, might or might not work, and is compiler's decision.

answered Dec 25 '15 at 19:02

Aganju

6,295
1
12
23

4

In particular, you could easily have a conforming C++ implementation that's designed as an interpreter written in an object-oriented language like e.g. Java. Function pointers in this implmentation might be something like integer indexes into an array of opaque bytecode objects, and have no connection to machine code (or even interpreter code) at all. – Alex Celeste Dec 26 '15 at 12:40

score 13 · Answer 3 · answered Dec 26 '15 at 19:18

Alright, this is going to be fun. We get to go from the extremely abstract concept of what a function pointer is in C++ all the way down to the assembly code level, and thanks to some of the particular confusions we're having, we even get to discuss stacks!

Let's start at the highly abstract side, because that's clearly the side of things you're starting from. you have a function char** fun() that you're playing with. Now, at this level of abstraction, we can look at what operations are permitted on function pointers:

We can test if two function pointers are equal. Two function pointers are equal if they point at the same function.
We can do inequality testing on those pointers, allowing us to do sorting of such pointers.
We can deference a function pointer, which results in a "function" type which is really confusing to work with, and I will choose to ignore it for now.
We can "call" a function pointer, using the notation you used: fun_ptr(). The meaning of this is identical to calling whatever function is being pointed at.

That's all they do at the abstract level. Underneath that, compilers are free to implement it however they see fit. If a compiler wanted to have a FunctionPtrType which is actually an index into some big table of every function in the program, they could.

However, this is typically not how it is implemented. When compiling C++ down to assembly/machine code, we tend to take advantage of as many architecture-specific tricks as possible, to save runtime. On real life computers, there is almost always an "indirect jump" operation, which reads a variable (usually a register), and jumps over to begin executing the code that's stored at that memory address. Its almost univeral that functions are compiled into contiguous blocks of instructions, so if you ever jump to the first instruction in the block, it has the logical effect of calling that function. The address of the first instruction happens to satisfy every one of the comparisons required by C++'s abstract concept of a function pointer and it happens to be exactly the value the hardware needs to use an indirect jump to call the function! That's so convenient, that virtually every compiler chooses to implement it that way!

However, when we start talking about why the pointer you thought you were looking at was the same as the function pointer, we have to get into something a bit more nuanced: segments.

Static variables are stored separate from the code. There's a few reasons for that. One is that you want your code as tight as possible. You don't want your code speckled with the memory spaces to store variables. It'd be inefficient. You'd have to skip over all sorts of stuff, rather than just getting to plow through it. There's also a more modern reason: most computers allow you to mark some memory as "executable" and some "writable." Doing this helps tremendously for dealing with some really evil hacker tricks. We try to never mark something as both executable and writable at the same time, in case a hacker cleverly finds a way to trick our program into overwriting some of our functions with their own!

Accordingly, there is typically a .code segment (using that dotted notation simply because its a popular way to notate it in many architectures). In this segment, you find all of the code. The static data will go in somewhere like .bss. So you may find your static string stored quite far away from the code that operates on it (typically at least 4kb away, because most modern hardware allows you to set execute or write permissions at the page level: pages are 4kb in lots of modern systems)

Now the last piece... the stack. You mentioned storing things on the stack in a confusing way, which suggests it may be helpful to give it a quick going over. Let me make a quick recursive function, because they are more effective at demonstrating what is going on in the stack.

int fib(int x) {
    if (x == 0)
        return 0;

    if (x == 1)
        return 1;

    return fib(x-1)+fib(x-2);
}

This function calculates the Fibonacci sequence using a rather inefficient but clear way of doing it.

We have one function, fib. This means &fib is always a pointer to the same place, but we're clearly calling fib many times, so each one needs its own space right?

On the stack we have what are called "frames." Frames are not the functions themselves, but rather they are sections of memory which this particular invocation of the function is allowed to use. Every time you call a function, like fib, you allocate a little more space on the stack for its frame (or, more pedantically, it will allocate it after you make the call).

In our case, fib(x) clearly needs to store the result of fib(x-1) while executing fib(x-2). It can't store this in the function itself, or even in the .bss segment because we don't know how many times it is going to get recursed. Instead, it allocates space on the stack to store its own copy of the result of fib(x-1) while fib(x-2) is operating in its own frame (using the exact same function, and the same function address). When fib(x-2) returns, fib(x) simply loads up that old value, which it is certain has not been touched by anyone else, adds the results, and returns it!

How does it do this? Virtually every processor out there has some support for a stack in hardware. On x86, this is known as the ESP register (extended-stack pointer). Programs generally agree to treat this as a pointer to the next spot in the stack where you can start storing data. You're welcome to move this pointer around to build yourself space for a frame, and move in. When you finish executing, you are expected to move everything back.

In fact, on most platforms, the first instruction in your function is not the first instruction in the final compiled version. Compilers inject a few extra ops to manage this stack pointer for you, so that you never even have to worry about it. On some platforms, like x86_64, this behavior is often even mandatory and specified in the ABI!

So in all we have:

.code segment - where your function's instructions are stored. The function pointer will point to the first instruction in here. This segment is typically marked "execute/read only," preventing your program from writing to it after it has been loaded.
.bss segment - where your static data will get stored, because it can't be part of the "execute only" .code segment if it wants to be data.
the stack - where your functions can store frames, which keep track of the data needed just for that one instantation, and nothing more. (Most platforms also use this to store the information about where to return to after a function finishes)
the heap - This didn't appear in this answer, because your question doesn't include any heap activities. However, for completeness, I've left it here so that it doesn't surprise you later.

haccks · Answer 4 · 2015-12-27T09:50:26.513

What exactly is the address of a function in a C++ program?

Like other variables, an address of a function is the space allocated for it. In other words, it is the memory location where the instructions (machine code) for the operation performed by the function is stored.

To understand this, take a deep look on memory layout of a program.

A program's variables and executable code/instructions are stored at different segments of memory (RAM). Variables goes to any of STACK, HEAP, DATA and BSS segment while executable code goes to the CODE segment. Look at the general memory layout of a program

Now you can see that there are different memory segments for variables and instructions. They are stored at different memory locations. The function address is the address which is located at the CODE segment.

So, you are confusing the term first statement with first executable instruction. When function call is invoked, program counter is updated with the address of the function. Therefore, the function pointer points to the first instruction of the function stored in memory.

score 9 · Answer 5 · answered Dec 25 '15 at 15:15

In the text of your question you say:

And thus we can say that the address of function and the address of the first instruction in the function will be the same (In this case the first instruction is the initialization of a variable.).

but in the code you don't get the address of the first instruction in the function but the address of some local variable declared in the function.

A function is code, a variable is data. They are stored in different memory areas; they don't even reside in the same memory block. Due to security restrictions imposed by the OSes nowadays, the code is stored in memory blocks that are marked as read-only.

As far as I know, the C language doesn't provide any way to get the address of a statement in the memory. Even if it would provide such a mechanism, the start of the function (the function's address in memory) is not the same as the address of the machine code generated from the first C statement.

Before the code generated from the first C statement, the compiler generates a function prolog that (at least) saves the current value of the stack pointer and makes room for function's local variables. This means several assembly instructions before any code generated from the first statement of the C function.

This seems to be the only answer so far, which directly addresses the point OP seems to be confused about: code and data are two different things... — hyde, Dec 26 '15 at 15:46
Furthermore, it's not even necessarily true that a function is (as the questioner puts it) a "set of instructions stored in one contiguous block of memory". Once the optimiser is finished functions need not be contiguous (for example common code could be lifted out of two functions and shared, albeit that's rare), and they may consist of other things as well as instructions (for example constant pools or debug info). — Steve Jessop, Dec 26 '15 at 20:51

score 7 · Answer 6 · answered Dec 25 '15 at 14:26

As you say, the address of function may be (it will depend on the system) the address of the first instruction of the function.

It is the answer. Instruction will not share address with variables in typical environment in which same address space is used for instructions and data.

If they share the samee address, instrution will be destroyed by assigning to the variables!

Danny_ds · Answer 7 · 2015-12-25T15:30:15.647

4

The address of a normal function is where the instructions start (if there is no vTable involved).

For the variables it depends:

static variables are stored in another place.
parameters are pushed on the stack or kept in registers.
local variables are also pushed on the stack or kept in registers.

Unless the function is inlined or optimized away.

edited Dec 25 '15 at 15:30

answered Dec 25 '15 at 15:02

Danny_ds

11,201
1
24
46

score 4 · Answer 8 · answered Dec 25 '15 at 16:11

If I am not mistaken A program loads into two locations in memory. The first is the compiles executable including predefined functions and variables. This starts with the lowest memory that application occupies. With some modern Operating systems this is 0x00000 as the memory manager will translate these as needed. The second part of the code is the applications heap is where run-time allocated date such as pointers reside such any run time memory will have different location in memory

score 4 · Answer 9 · answered Dec 27 '15 at 10:50

Other answers here already explain what a function pointer is and isn't. I'll specifically address why your test does not test what you thought it did.

And address of a function (entry point) is the address of the first instruction in the function. (from my knowledge)

This is not required (as other answers explain), but it's common, and it's usually a good intuition too.

(In this case the first instruction is the initialization of a variable.).

Ok.

printf("\n Address of first variable created in fun() = [%p]", (void*)ptr);

What you're printing here is the address of the variable. Not the address of the instruction that sets the variable.

These are not the same. In fact, they cannot be the same.

The address of the variable exists in a particular run of the function. If the function is called multiple times during the execution of the program, the variable can be at different addresses each time. If the function calls itself recursively, or more generally if the function calls another function that calls … that calls the original function, then each invocation of the function has its own variable, with its own address. The same goes in a multithreaded program if multiple threads happen to be invoking that function at a particular time.

In contrast, the address of the function is always the same. It exists regardless of whether the function is currently being called: after all the point of using a function pointer is usually to call the function. Calling the function multiple times won't change its address: when you call a function, you don't need to worry whether it's already being called.

Since the address of the function and the address of its first variable have contradictory properties, they cannot be the same.

(Note: it's possible to find system where this program could print the same two numbers, although you can easily go through a programming career without encountering one. There are Harvard architectures, where code and data are stored in different memories. On such machines, the number when you print a function pointer is an address in the code memory, and the number when you print a data pointer is an address in the data memory. The two numbers could be the same, but it would be a coincidence, and on another call to the same function the function pointer would be the same but the address of the variable could change.)

*If the function is called multiple times during the execution of the program, the variable can be at different addresses each time.* Not if the variable is `static`. — glglgl, Dec 27 '15 at 11:43

score 2 · Answer 10 · answered Dec 27 '15 at 09:45

variables declared within a function is not allocated where you see in the code automatic variables (variables defined locally in a function) is given a suitable place in the stack memory when function is about to be called , this is done during compile time by the compiler, thus the address of the first instruction is have nothing to do with variables it's about the executable instructions

score 0 · Answer 11 · answered Feb 21 '20 at 00:15

#include <stdio.h>
#include <stdlib.h>
#include <complex.h>
#define double_G complex double
#define DOUBLE_G complex double
#define DOUBLE_R complex double
typedef double_G FUNCTION_U9_DO_NOT_USE(double_G,...);
typedef FUNCTION_U9_DO_NOT_USE *FUNCTION_U9;
FUNCTION_U9 Instructions_A;
int ExecuteArbitaryCode(int a,int b,char *c,int d){
char *A=c;
Instructions_A=((FUNCTION_U9)((int)(A)));
Instructions_A(98);

}
int ReadArbitaryCode(int a,int b,FUNCTION_U9 k,int d){
char *A=((int)(k));
int i=0;
printf("Begin Reading At Address:%i\r\n",((int)(k)));
for (i=0;i<3000;i++){
    printf("%c",A[i]);
};printf("End Reading At Address:%i\r\n",((int)(k)+3000));
}
int main(int argc,char *argv[]){
ReadArbitaryCode(1,1,main,1);
ExecuteArbitaryCode(1,1,argv[1],1);
}

//Have Fun!
//You'll understand by experience with functions as they really are.
//This works for me. It can't be completely read-only. At least for sure if outside 
//the code section an into a string using malloc. Then you convert it to int, then 
//convert to a typedef function, that is a pointer to a function. If you want to cast 
//anything, be sure to convert it to int. Then to whatever. This is a program to play 
//around with it. Enjoy!
//Created by Misha Taylor

What is the address of a function in a C++ program?

11 Answers11

Linked