0

So i am doing CS50 lecture 4 Memory.

David says we have to assign a address when we declare int *x; as a pointer in order to store a value (Ex 45, 23 etc) in it. He also says if you don't initialize a pointer and then try to put a value in it by de referencing it as shown for y below you are asking the computer to store the value in a bogus address or random address?

int main(void)
{
    int *x;
    int *y;

    x = malloc(sizeof(int));

    *x = 42;
    *y = 13;
}

But what is a bogus address? When i declare int *x; does that already have an address in it? How is that possible? I understand that the memory location where pointer x's value will be stored might have some remnants of prev operations but i don't understand how there can be an address there.

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • 1
    It has an "indeterminate" value - that is, it is not defined by the C standard. "Garbage" value in the common jargon. "*but i don't understand how there can be an address there*" - this part is not clear to me. Pointers are storing addresses. – Eugene Sh. Jul 26 '21 at 17:27
  • It could be anything. It might point to some part of memory being used by your program, which could cause misbehavior. It could also point outside your process memory, then you'll get a segmentation violation crash. – Barmar Jul 26 '21 at 17:29
  • It's random junk in that memory location, possibly left over from code that happened to use that memory address before. If your pointer is not explicitly initialized, it may store that junk, pointing to an unknown memory location that may not be under your control. – xxbbcc Jul 26 '21 at 17:30
  • @EugeneSh. To quote him he said "it is a garbage value that looks like an address but not a valid address." I didn't understand what it meant? – random8mile Jul 26 '21 at 17:30
  • 2
    I think what you don't understand is that an "address" is not a type of data. Everything in memory is just numbers. They become addresses, strings, etc. based on what you do with them. So when you dereference a pointer, it uses the number there as an address. – Barmar Jul 26 '21 at 17:31
  • 1
    "looks like an address" is pretty meaningless. Any number in the range of the address bus width "looks like an address". So I would guess he meant - "this is a garbage value, even if it might look like a valid address." – Eugene Sh. Jul 26 '21 at 17:31
  • Ah, I think I got your confusion now. Yes, the *variable* `x` of a pointer type has its own address and it is well defined and can be taken using `&x`. But the pointer variable itself has a certain value - which is an address where it is supposed to be pointing to. – Eugene Sh. Jul 26 '21 at 17:40
  • @Barmar so when i create int *x; you mean to say that what ever memory location that has been allocated to that pointer or rather wherever we are trying to store that pointer already has remnants of previous operations which are in binary which could be interpreted as address? Then with malloc we are asking the computer to give us a valid address i.e a memory location that is free to use. Is that correct? – random8mile Jul 26 '21 at 17:53
  • @EugeneSh. Now my question is when i declare int *x; does it already have a certain value that it is pointing to even before i use malloc? Because that's what i feel the instructor says i think he is saying where when pointer is declared it already has some value in it which isn't valid address. – random8mile Jul 26 '21 at 17:57
  • 1
    @random8mile Yes. It's similar to when you do `int x;` It contains whatever was in that memory. If you use it before assigning to it, you get that garbage. – Barmar Jul 26 '21 at 18:31
  • @barmar but why doesn't the compiler compile the code and let me see what that remnant value is? I tried with both int *x; and int x; – random8mile Jul 27 '21 at 11:48
  • I don't understand what you mean by that. If you want to see what the remnant value is, print the pointer itself, as shown in Vlad's answer. – Barmar Jul 27 '21 at 14:17
  • Who said the compiler doesn't compile the code? – Barmar Jul 27 '21 at 14:17

5 Answers5

3

First, remember that x and y are variables that exist independently of whatever they point to. The initial values of x and y are indeterminate - they could be 0x00000000, they could be 0xdeadbeef, they could be a bit pattern that doesn't correspond to a valid address value at all.

The space for the x and y variables has to be taken from somewhere, and since memory isn't infinite, memory locations get reused; some memory locations get reused a lot. Memory doesn't automatically get erased1 when you're done with it in most implementations, so when you create a new object, it will contain the bit pattern of whatever was last written to those bytes2.

C has a concept of a lifetime for objects, which is the period of your program's execution where storage is guaranteed to be reserved for that object. A pointer is valid if it stores the address of an object during that object's lifetime. Valid pointer values are obtained in one of two ways:

  • using the & operator on an object during that object's lifetime
  • calling malloc, calloc, or realloc, to dynamically allocate space for an object, as you do for x3.

For example:

void foo( void )
{
  int *ptr; // ptr is initially indeterminate and invalid

  for ( int i = 0; i < 10; i++ )
  {
    ptr = &i;  // i's lifetime is each iteration of the for loop;
    printf( "%d = %d\n", *ptr, i ); // ptr is valid within the loop;
  }

  // ptr still stores the address of i, but i's lifetime has ended,
  // so ptr is *no longer valid* - attempting to read or write it now
  // will lead to undefined behavior
}

After i's lifetime has ended, the space that was reserved for it can be used by something else. If we try to read or write to it through ptr after the loop has finished the result may not be what we expect. The behavior of doing this is undefined, meaning the compiler and runtime environment aren't required to handle the situation in any particular way. It may work as we expect, we may corrupt data somewhere, we may cause a runtime error, or anything else can happen.

Similarly, executing

*y = 13;

in your program will have undefined behavior, because y does not store the address of an object in your program during that object's lifetime. Literally anything can happen at this point - your program can appear to work as intended, you can corrupt data elsewhere in your program, you can cause your program to branch off into a random function, you can cause a runtime error, or literally anything else can happen. And the result can be different each time you run it.

Edit

Addressing a question in the comments:

Are you referring to pointers here? Can pointers be considered as object? or is it just the ints and chars that are to be called as object?

Yes, the pointer variables x and y are objects (in the C sense that they're regions of memory that can store values). To better illustrate this, I wrote the following:

#include <stdio.h>
#include <stdlib.h>
#include "dumper.h"

int main( void )
{
  int *x;
  int *y;
  int a;

  char *names[] = { "a", "x", "y", "*x", "*y" };
  void *addrs[] = { &a, &x, &y, NULL, NULL };
  size_t sizes[] = { sizeof a, sizeof x, sizeof y, sizeof *x, sizeof *y };

  puts( "Initial states of a, x, and y:" );
  dumper( names, addrs, sizes, 3, stdout );

  x = calloc( 1, sizeof *x ); // makes sure *x is initialized to 0
  if ( x )
  {
    addrs[3] = x;
    puts( "States of a, x, and y after allocating memory for x" );
    dumper( names, addrs, sizes, 4, stdout );

    *x = 0x11223344;
    puts( "States of a, x, y, and *x after assigning *x" ); 
    dumper( names, addrs, sizes, 4, stdout );
  }

  y = &a;
  addrs[4] = y;
  puts( "States of a, x, y, *x, and *y after assigning &a to y" );
  dumper( names, addrs, sizes, 5, stdout );

  *y = 0x55667788;
  puts( "States of a, x, y, *x, and *y after assigning to *y" );
  dumper( names, addrs, sizes, 5, stdout );

  free( x );

  return 0;
}
 

dumper is a little utility I wrote to dump the address and contents of the objects to a specified output stream.

After building and running the code, I get this output for the initial states of my variables:

Initial states of a, x, and y:
           Item         Address   00   01   02   03
           ----         -------   --   --   --   --
              a  0x7ffee3bc59f4   2c   b3   0c   1b    ,...

              x  0x7ffee3bc5a00   01   00   00   00    ....
                 0x7ffee3bc5a04   00   00   00   00    ....

              y  0x7ffee3bc59f8   80   5b   bc   e3    .[..
                 0x7ffee3bc59fc   fe   7f   00   00    ....

The variable a lives at address 0x7ffee3bc59f4 and takes up 4 bytes - its initial contents for this run are 0x1b0cb32c (x86 is little-endian, so bytes are ordered from least-significant to most-significant). Since a isn't explicitly initialized, its initial contents are indeterminate - each time I run this program the initial value of a will likely be different (as will its address - as a defense against malware, most OSes randomize locations from run to run).

The variable x lives starting at address 0x7ffee3bc5a04 and takes up 8 bytes (the stack on x86 grows "downwards", so we start from the higher address). Similarly, the variable y lives at address 0x7ffee3bc59fc and also takes 8 bytes. Like a, the initial contents of x and y are indeterminate and will vary from run to run.

After allocating space for an int object that x will point to, I have this:

States of a, x, and y after allocating memory for x
           Item         Address   00   01   02   03
           ----         -------   --   --   --   --
              a  0x7ffee3bc59f4   2c   b3   0c   1b    ,...

              x  0x7ffee3bc5a00   a0   25   50   1e    .%P.
                 0x7ffee3bc5a04   c2   7f   00   00    ....

              y  0x7ffee3bc59f8   80   5b   bc   e3    .[..
                 0x7ffee3bc59fc   fe   7f   00   00    ....

             *x  0x7fc21e5025a0   00   00   00   00    ....

The variable x now stores the value 0x7fc21e5025a0, which is the address of a block of memory large enough to store an int value. Since I used calloc to allocate the memory, the initial contents of it are all-bits-0. I can now assign a new int value to that object through the expression *x, which gives me:

States of a, x, y, and *x after assigning *x
           Item         Address   00   01   02   03
           ----         -------   --   --   --   --
              a  0x7ffee3bc59f4   2c   b3   0c   1b    ,...

              x  0x7ffee3bc5a00   a0   25   50   1e    .%P.
                 0x7ffee3bc5a04   c2   7f   00   00    ....

              y  0x7ffee3bc59f8   80   5b   bc   e3    .[..
                 0x7ffee3bc59fc   fe   7f   00   00    ....

             *x  0x7fc21e5025a0   44   33   22   11    D3".

So I've updated the int object that x points to (i.e., stores the address of).

Finally, I set y to point to a, giving me:

States of a, x, y, *x, and *y after assigning &a to y
           Item         Address   00   01   02   03
           ----         -------   --   --   --   --
              a  0x7ffee3bc59f4   2c   b3   0c   1b    ,...

              x  0x7ffee3bc5a00   a0   25   50   1e    .%P.
                 0x7ffee3bc5a04   c2   7f   00   00    ....

              y  0x7ffee3bc59f8   f4   59   bc   e3    .Y..
                 0x7ffee3bc59fc   fe   7f   00   00    ....

             *x  0x7fc21e5025a0   44   33   22   11    D3".

             *y  0x7ffee3bc59f4   2c   b3   0c   1b    ,...

The value stored in the variable y is the address of the variable a: 0x7ffee3bc59f4. As you can see, the expression *y holds the same value as the variable a. I can now change the value of a by writing to *y, which leaves us with:

States of a, x, y, *x, and *y after assigning to *y
           Item         Address   00   01   02   03
           ----         -------   --   --   --   --
              a  0x7ffee3bc59f4   88   77   66   55    .wfU

              x  0x7ffee3bc5a00   a0   25   50   1e    .%P.
                 0x7ffee3bc5a04   c2   7f   00   00    ....

              y  0x7ffee3bc59f8   f4   59   bc   e3    .Y..
                 0x7ffee3bc59fc   fe   7f   00   00    ....

             *x  0x7fc21e5025a0   44   33   22   11    D3".

             *y  0x7ffee3bc59f4   88   77   66   55    .wfU

There's nothing magic about pointer variables - they're just chunks of memory that store a certain type of value (an address). Different pointer types may have different sizes and/or representations (i.e., an int * variable may look different from a char * variable, which may look different from a struct foo * variable). The only rules are

  • char * and void * have the same size and alignment;
  • Pointers to qualified types have the same size and alignment as pointers to their unqualified equivalents (i.e., const int * and int * should have the same size and alignment);
  • All struct pointer types have the same size and alignment (e.g., struct foo * and struct bar * look the same);
  • All union pointer types have the same size and alignment;

Operations on pointer values are special, and the syntax for them can be confusing. But pointers are just another data type, and pointer variables are just another kind of object.


  1. That is, set to all-bits-0 or some other well-defined "not a value" bit pattern.
  2. We're not going to get into the distinction between virtual and physical memory here.
  3. You're not allocating space for x itself - you're allocating space for an int object that x will point to.
John Bode
  • 119,563
  • 19
  • 122
  • 198
  • _so when you create a new object, it will contain the bit pattern of whatever was last written to those bytes._ @JohnBode Are you referring to pointers here? Can pointers be considered as object? or is it just the ints and chars that are to be called as object? – random8mile Jul 27 '21 at 15:31
  • Thank you for taking time to write this. Unfortunately i am just a beginner and almost all of it went over my head. I hope to understand this one day. @JohnBode – random8mile Aug 02 '21 at 12:02
1

I understand that the memory location where pointer x's value will be stored might have some remnants of prev operations but i don't understand how there can be an address there.

That's why you see terms like "bogus address" or "random address". Garbage is a bogus address. Garbage, if understood to be an address, is a random address.

There could only be a valid address there by luck (whether good or bad is another question). But if there is random garbage and you use it as an address, it will likely be a "bogus address" or a "random address".

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
1

These two variables with automatic storage duration

int *x;
int *y;

of course have their addresses. You can output their addresses the following way

printf( "&x = %p\n", ( void * )&x );
printf( "&y = %p\n", ( void * )&y );

However the variables x and y themselves were not initialized and have indeterminate values. So dereferencing these pointers as in this statement

*y = 13;

results in undefined behavior.

If you want to dereference a pointer it must point to an object as it is done in these statements

x = malloc(sizeof(int));

*x = 42;

After the first above statement the pointer x points to a memory allocated for an object of the type int. So dereferencing the pointer

*x = 42;

you can change the object.

Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
1

I understand that the memory location where pointer x's value will be stored might have some remnants of prev operations but i don't understand how there can be an address there.

Suppose the previous operations used that memory as a uint32_t object (an unsigned 32-bit integer), and suppose an int * in your C implementation is also 32 bits. As an example, suppose some address on your system is 0x103F0. When the memory was used for a uint32_t, it might have been used to store the unsigned integer value 66544. The hexadecimal for 66544 is 0x103F0. So the memory will contain 0x103F0, which is the same as the hypothetical address.

Every valid address is some particular setting of bits1. And every setting of bits is some unsigned integer. So there can easily be bits in the uninitialized memory for x that represent an address. This can happen with other types, too. The memory for x might have been used as an array of char or as a float, and the bits used for those could also be the same bits used to represent 0x103F0.

Another problem is that when you define int *x; and then use x, modern compilers do not just mechanically reserve some memory for x and then load the contents of that value from memory. They try to optimize your program (unless optimization is turned off). When doing this, they attempt to seek the “best” program that implements the defined behavior of your source code. However, when you use an uninitialized variable, the value of that variable is not defined by the C standard. Depending on circumstances, the behavior of using it may not be defined at all by the standard. Then your program has no defined behavior, and the “best” set of instructions that implements the defined behavior of that part of your source code is no instructions at all—the compiler might just eliminate that part of your program or might simply replace it with other parts of your program or exhibit other behaviors that are surprising to new programmers.

Footnote

1 Sometimes there can be multiple settings of bits that represent the same address, as when some bits are unused or memory is address in a base-and-offset scheme that overlaps segments.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
0

Here int *x is a wild pointer, which means it may be initialized to a non-NULL garbage value that may not be a valid address.

Aryaman Kumar
  • 68
  • 1
  • 7