How can I understand the concept of pointers (*) and address-of (&) operators?

Question

I am trying to understand the significance of these two operators, so I wrote this code just for that purpose.

#include <stdio.h>
#include <string.h>

int main()
{
    char *mnemonic, *operands;

    mnemonic = "add";
    operands = "five to two";

    analyse_inst(mnemonic, operands);

}

void analyse_inst(char mnemonic, char operands)
{
    printf("%s", mnemonic);
    printf("%s", operands);
}

However, I noticed that it wouldn't work unless I change the arguments of analyse_inst() function to analyse_inst(char * mnemonic, char * operands), which means that I will be passing pointers to the function. But why is that required?

Also, I looked up about "passing by reference." And according to tutorialspoint.com, its definition:

The call by reference method of passing arguments to a function copies the address of an argument into the formal parameter. Inside the function, the address is used to access the actual argument used in the call. It means the changes made to the parameter affect the passed argument.

From that, I got that passing a variable by reference and then modifying that value would mean that the same variable outside the function would be changed as well; whereas for passing a variable by value would not change the same variable located outside the function.

Am I going wrong anywhere?

How can I can modify my code such that I am passing the two variables by reference?

(P.S. I have read other Stack Overflow threads on the same topic, but I would appreciate it if anyone could explain it in the context of the code I wrote)

@MikeKinghan Please don't recommend that list to anyone, it is complete trash and in desperate need of a major clean-up. See https://meta.stackoverflow.com/questions/355588/the-c-book-list-has-gone-haywire-what-to-do-with-it — Lundin, Feb 27 '19 at 12:01

score 2 · Accepted Answer · answered Feb 27 '19 at 11:58

which means that I will be passing pointers to the function. But why is that required?

Because what you have in main are pointers, and what printf("%s" expects is a char*.

"Pass by reference" is a broad term in programming, meaning passing along an address rather than a copy of the object. In your case, you pass a pointer to the first element of each string, rather than making a copy of the whole string, since that would waste execution time and memory.

So while the strings themselves could be said to be "passed by reference", strictly speaking C actually only allows parameters to be passed by value. The pointers themselves are passed by value. Your function parameters will be copies of the pointers you have allocated in main(). But they point at the same strings as the pointers in main() do.

From that, I got that passing a variable by reference and then modifying that value would mean that the same variable outside the function would be changed as well;

Indeed, you could change the string from inside the function through the pointer and then it would affect the string in main(). But in this case, you haven't allocated any memory to modify - you would attempt to modify a string literal "...", which would have been a bug. If you were to modify the strings, you should have declared them as arrays in main(): char mnemonic[] = "add";

Now as it turns out, whenever you use an array like the one in my example inside an expression, it "decays" into a pointer to the first element. So we wouldn't actually be able to pass the array by value to the function, as the C language would have changed it between the lines to a pointer to the first element.

You can play around with this code:

#include <stdio.h>
#include <string.h>

void analyse_inst(char* mnemonic, char* operands);

int main()
{
    char mnemonic[] = "add";
    char operands[] = "five to two";

    analyse_inst(mnemonic, operands);
    printf("%s\n", mnemonic);
}

void analyse_inst(char* mnemonic, char* operands)
{
    printf("%s ", mnemonic);
    printf("%s\n", operands);

    strcpy(mnemonic, "hi");
}

score 2 · Answer 2 · answered Feb 27 '19 at 12:01

When you write something like char *mnemonic that means you are creating a pointer variable (variable that will hold the address of another variable) but since the data type of the mnemonic is char it will hold the address of variable with char datatype only.

Now, inside your code you have written mnemonic = "add" so here "add" is the string that is array of characters and mnemonic is pointing to the base address of that array.

and while calling the function you are passing the references of these char arrays, so you need to change void analyse_inst(char mnemonic, char operands) to void analyse_inst(char *mnemonic, char *operands) to get the references in these respective pointer variables. Reason is same We need pointer variables to hold the references.

And the & returns the address of the variable, that means the reference to the memory location in which the variable is stored.

Hope this will help.

score 1 · Answer 3 · answered Feb 27 '19 at 11:54

Strings in C are stored as arrays of characters, terminated by a character with the value '\0' ("NIL"). You cannot directly pass around arrays, so instead a pointer to the first character is used, which is why you must pass char *s to the function in order to access strings.

A character is typically much smaller than a pointer (think 8 vs 32/64 bits), so you cannot squeeze a pointer value into a single character.

C does not have pass by reference; it's pass by value only. Sometimes that value is as close to a reference as the language can come (i.e. a pointer), but then that pointer is in turn passed by value.

Consider this:

static void put_next(const char *s)
{
  putchar(*s++);
}

int main(void)
{
  const char *string = "hello";
  put_next(string);
  put_next(string);
}

This will print hh, since it's being passed the same value string every time, the fact that s, which is a different variable holding a copy of the same value, is incremented inside the function doesn't matter. The incremented value is local to the function, and thrown away once it goes out of scope.

Ah, thanks. That makes much more sense. Since it's arrays I am passing, the code `analyse_inst(&mnemonic, &operands)` would not be allowed, right? — Tina, Feb 27 '19 at 11:59
@Tina You are not passing arrays, they cannot be directly passed to a function in C. What happens is that the name of the array sometimes converted to a pointer to the first element of the array. — unwind, Feb 27 '19 at 12:13

score 1 · Answer 4 · answered Feb 27 '19 at 18:23

I will discuss things in the context of your code, but I want to get some basics out of the way first.

In a declaration, the unary * operator indicates that the thing being declared has pointer type:

T *p;       // for any type T, p has type "pointer to T"
T *p[N];    // for any type T, p has type "N-element array of pointer to T"
T (*p)[N];  // for any type T, p has type "pointer to N-element array of T"
T *f();     // for any type T, f has type "function returning pointer to T"
T (*f)();   // for any type T, f has type "pointer to function returning T"

The unary * operator has lower precedence then the postfix [] subscript and () function operators, so if you want a pointer to an array or a function, the * must be explicitly grouped with the identifier.

In an expression, the unary * operator dereferences the pointer, allowing us to access the pointed-to object or function:

int x;
int *p;
p = &x;  // assign the address of x to p
*p = 10; // assigns 10 to x via p - int = int

After the above code has executed, the following are true:

 p == &x       // int * == int *
*p ==  x == 10 // int   == int   == int

The expressions p and &x have type int * (pointer to int), and their value is the (virtual) address of x. The expressions *p and x have type int, and their value is 10.

A valid¹ object pointer value is obtained in one of three ways (function pointers are also a thing, but we won't get into them here):

using the unary & operator on an lvalue² (p = &x;);
allocating dynamic memory via malloc(), calloc(), or realloc();
and, what is relevant for your code, using an array expression without a & or sizeof operator.

Except when it is the operand of the sizeof or unary & operator, or is a string literal used to initialize a character array in a declaration, an expression of type "N-element array of T" is converted ("decays") to an expression of type "pointer to T", and the value of the expression is the address of the first element of the array³. So, if you create an array like

int a[10];

and pass that array expression as an argument to a function like

foo( a );

then before the function is called, the expression a is converted from type "10-element array of int" to "pointer to int", and the value of a is the address of a[0]. So what the function actually receives is a pointer value, not an array:

void foo( int *a ) { ... }

String literals like "add" and "five to two" are array expressions - "add" has type "4-element array of char" and "five to two" has type "12-element array of char" (an N-character string requires at least N+1 elements to store because of the string terminator).

In the statements

mnemonic = "add";
operands = "five to two";

neither string literal is the operand of the sizeof or unary & operators, and they're not being used to initialize a character array in a declaration, so both expressions are converted to type char * and their values are the addresses of the first element of each array. Both mnemonic and operands are declared as char *, so this is fine.

Since the types of mnemonic and operands are both char *, when you call

analyse_inst( mnemonic, operands );

the types of the function's formal arguments must also be char *:

void analyse_inst( char *mnemonic, char *operands ) 
{
  ...
}

As far as the "pass by reference" bit...

C passes all function arguments by value. That means the formal argument in the function definition is a different object in memory from the actual argument in the function call, and any changes made to the formal argument are not reflected in the actual argument. Suppose we write a swap function as:

int swap( int a, int b )
{
  int tmp = a;
  a = b;
  b = tmp;
}

int main( void )
{
  int x = 2;
  int y = 3;

  printf( "before swap: x = %d, y = %d\n", x, y );
  swap( x, y );
  printf( "after swap: x = %d, y = %d\n", x, y );
  ...
}

If you compile and run this code, you'll see that the values of x and y don't change after the call to swap - the changes to a and b had no effect on x and y, because they're different objects in memory.

In order for the swap function to work, we have to pass pointers to x and y:

void swap( int *a, int *b )
{
  int tmp = *a;
  *a = *b;
  *b = tmp;
}

int main( void )
{
  ...
  swap( &x, &y );
  ...
}

In this case, the expressions *a and *b in swap refer to the same objects as the expressions x and y in main, so the changes to *a and *b are reflected in x and y:

 a == &x,  b == &y
*a ==  x, *b ==  y

So, in general:

void foo( T *ptr ) // for any non-array type T
{
  *ptr = new_value(); // write a new value to the object `ptr` points to
}

void bar( void )
{
  T var;
  foo( &var ); // write a new value to var
}

This is also true for pointer types - replace T with a pointer type P *, and we get the following:

void foo( P **ptr ) // for any non-array type T
{
  *ptr = new_value(); // write a new value to the object `ptr` points to
}

void bar( void )
{
  P *var;
  foo( &var ); // write a new value to var
}

In this case, var stores a pointer value. If we want to write a new pointer value to var through foo, then we must still pass a pointer to var as the argument. Since var has type P *, then the expression &var has type P **.

^{A pointer value is valid if it points to an object within that object's lifetime.
An lvalue is an expression that refers to an object such that the object's value may be read or modified.
Believe it or not there is a good reason for this rule, but it means that array expressions lose their "array-ness" under most circumstances, leading to much confusion among people first learning the language.}

How can I understand the concept of pointers (*) and address-of (&) operators?

4 Answers4