12

Normally, if you do the following:

int * i = &someint;

It's just a pointer to a variable.

But, when you do

char * str = "somestring";

it automatically turns it into an array. Is it the pointer which is doing this, or is it just syntactic sugar for initialization syntax?

zeboidlund
  • 9,731
  • 31
  • 118
  • 180
  • 3
    You might look at a good tutorial on [arrays and pointers](http://www.lysator.liu.se/c/c-faq/c-2.html), the section on *array decay* will help you answer this question. Also note that `char * x = "...";` is supported for backward compatibility but since it is illegal to modify the array pointed to by `x`, you should write `const char * x = "...";`. – André Caron Dec 12 '11 at 05:07
  • 2
    In any language I know of, a string is an array of a characters. – Matt Dec 12 '11 at 09:30

6 Answers6

14

No, the string literal "somestring" is already a character array, almost certainly created by your compiler.

What that statement is doing is setting str to point to the first character. If you were to look at the underlying assembler code, it would probably look like:

str314159:  db   "somestring", 0  ; all string literals here.
: :         : :
            load r0, str314159    ; get address of string
            stor r0, -24[sp]      ; store it into local var str.

In a large number of cases, an array will decay to a pointer to the first element of that array (with some limited exceptions such as when doing sizeof).


By way of example, the following C code:

#include <stdio.h>

int main (void) {
    char *somestr = "Hello";
    puts (somestr);
    return 0;
}

when compiled with gcc -S to generate x86 assembly, gives us (with irrelevant cruft removed):

.LC0:
    .string    "Hello"
    .text
.globl main
    .type      main, @function
main:
    pushl      %ebp                ; Standard set up of stack frame,
    movl       %esp, %ebp          ;   aligning and making
    andl       $-16, %esp          ;   room for
    subl       $32, %esp           ;   local variables.

    movl       $.LC0, 28(%esp)     ; Load address of string in to somestr.

    movl       28(%esp), %eax      ; Call puts with that variable.
    movl       %eax, (%esp)
    call       puts

    movl       $0, %eax            ; Set return code.

    leave                          ; Tear down stack frame and return.
    ret

You can see that the address of the first character, .LC0, is indeed loaded into the somestr variable. And, while it may not be immediately obvious .string does create an array of characters terminated by the NUL character.

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • so, if I do `char somestring = "somestring"`, is that valid? – zeboidlund Dec 12 '11 at 04:51
  • @Holland, not as is since you're assigning a _pointer_ to a _char._ Or, more correctly, it's _valid_ (in C anyway, don't think so in C++) but will not do what you expect. However, `char *somestring = "somestring";` is okay. – paxdiablo Dec 12 '11 at 05:08
  • How is `char somestring = "somestring";` valid in C? You can't assign a pointer to a `char` without getting (legitimate) warnings from a compiler, can you? Hmmm...I suppose it is only a warning (GCC says `warning: initialization makes integer from pointer without a cast`) and not an error, so it is 'allowed'. But only under protest. (As a local variable initializer, it only elicits that warning; as a global variable, I also get `error: initializer element is not computable at load time`, indicating that it is not always valid.) – Jonathan Leffler Dec 12 '11 at 05:48
  • @Jonathan, it's valid inasmuch as the standard allows it. The standard also allows `i = i++ + --i;` (however undefined it may be) but that doesn't make it a good idea :-) You're right about the global (or any static storage duration such as prefixing the local with `static`) since the address of the string is not known at compile time (this will be done at link or load time). As a local, it's calculated as the function runs. – paxdiablo Dec 12 '11 at 06:01
3

It is not a pointer to a variable. It is a pointer to a place in memory. You are creating a variable and storing it in some memory location, then pointing the pointer at that location. The reason it works for arrays is because the elements of the array are stored back to back in memory. The pointer points at the start of the array.

cadrell0
  • 17,109
  • 5
  • 51
  • 69
2
char * str 

is a pointer to a character. When you assign a string to a character pointer, it is pointing to the first character of the string, not the entire string. If the pointer is incremented you can see that it points to the second character in the string. When you print the character pointer, the cout object prints the character and continues printing character until a null character (\0) is seen.

#include <iostream>
using namespace std;

int main()
{
    char *s = "something";
    cout << "before :" << s << endl;
    s++;
    cout << "after :" << s << endl;
}

This program prints:

~/cpp: ./stringarray
before :something
after :omething
Sanish
  • 1,699
  • 1
  • 12
  • 21
0
int * i = &someint;

In addition to others comments, Generally, we can say it is pointer to location of size (int). So, When we access value inside 'i'. ie *i, the sizeof(int) memory location is retrieved. Also, the arithmetic calculation is done in the same way. Ie., incrementing the pointer i+1 , increments + sizeof (int). Hence the size of the retrieved data depends on 'data type' of the variable.

Whoami
  • 13,930
  • 19
  • 84
  • 140
0

The word you use "normally" is a big part of issue here.

I think part of what may make this confusing is many functions that that take char * are looking for a c style string (ie null terminated character array). Thats what they want. You could write a function that just looked at the character.

Similarly you could write a function that took a int* and treated it as a null terminated array as well, it is just not common. And for good reason because what if you wanted the value 0? in c style strings (meant for display not binary data) you would never want 0.

#include <iostream>

const int b_in_data[]={50,60,70,80,0};

int Display (const int * a)
{
  while ( *a != 0){
    std::cout << *a; ++a;
  }
}    

int main()
{

 int a[]={20,30,40,0};

 // or more like char* = something because compiler is making string literal for you 
 // probably somewhere in data section and replacing it with its address
 const int *b = b_in_data;

 Display(a);
 Display(b);
 return 0;
}

C style strings just chose to terminate instead of passing size, B style strings passed size instead. arrays of ints are generally not null terminated but could be. Comes down to "normally".

Joe McGrath
  • 1,481
  • 10
  • 26
0

As people said str is not an array but only a pointer to a char (The first of "something", so s). However there are 2 syntaxics sugar

1- "something" initialize a block of memory with all the characters, **and add \0 at the end. So

char *str = "something";

is syntaxic sugar for

char *str = {'s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', '\0'};
              ^                                          ^^^^^
              |
              +- str

So technically str, is 10 characters long not 9. (Note that str only point to the

2 -

str[5] 

is syntaxic sugar for

*(str + 5)      

Then, there is a convention that most (not all) C-function dealing with strings, expect the last character to \0 (to know where it ends). Some other (see strncpy, need the length as an extra argument and can add or not '\0'.

Shadab
  • 3
  • 1
  • 3
mb14
  • 22,276
  • 7
  • 60
  • 102