Here is some code exploring the additional copying that can result from assigning to a cell in an array (in this case using a for
loop).
# populate a vector with a million random numbers
n = 10^6
v=runif(n)
# vectorized version: fast
vv<-v*v;
m<-mean(vv); m
# for loop: slow
tracemem(vv)
for(i in 1:length(v)) { vv[i]<-v[i]*v[i] };
m<-mean(vv); m
outputs
> vv<-v*v;
> m<-mean(vv); m
[1] 0.3329162
> # for loop: slow
> tracemem(vv)
[1] "<0x000007ffff560010"
> for(i in 1:length(v)) { vv[i]<-v[i]*v[i] };
tracemem[0x000007ffff560010 -> 0x000007fffe570010]:
> m<-mean(vv); m
[1] 0.3329162
which seems to indicate that there is a copy of the vector on the very first iteration of the loop.
Note: this is a follow-up to my earlier question Why is vectorization faster, this answer to it, and this comment on the answer.
Just to confirm the copying, I did the first iteration outside of the loop body
v=runif(n)
# vectorized version: fast
vv<-v*v;
m<-mean(vv); m
# for loop: slow
tracemem(vv)
vv[1]<-v[1]*v[1]
tracemem(vv)
for(i in 2:length(v)) { vv[i]<-v[i]*v[i] };
m<-mean(vv); m
gives this output
> vv<-v*v;
> m<-mean(vv); m
[1] 0.33385
> # for loop: slow
> tracemem(vv)
[1] "<0x000007fffef80010"
> vv[1]<-v[1]*v[1]
tracemem[0x000007fffef80010 -> 0x000007fffddc0010]:
> tracemem(vv)
[1] "<0x000007fffddc0010"
> for(i in 2:length(v)) { vv[i]<-v[i]*v[i] };
> m<-mean(vv); m
[1] 0.33385 # (different as I generated the random nos again)
After reading joran's answer and this nabble discussion thread, I started to get familiar with the idea of R potentially copying vectors, e.g. when you change the type as below
> x = 1:10
> tracemem(x)
[1] "<0x00000000118ba4e0"
> x[5] = 6
tracemem[0x00000000118ba4e0 -> 0x0000000010d03568]:
> x = 1:10 # starts off as integer
> tracemem(x)
[1] "<0x00000000118ba538"
> x[5] = 6L # setting integer ok
> x[5] = 6 # setting floating point changes type
tracemem[0x00000000118ba538 -> 0x0000000010d03568]:
> x[6] = 7 # it's now floating point, setting floating point again ok
> x[7] = "asdf" # setting string changes type once more, this tanks on a large array
tracemem[0x0000000010d03568 -> 0x0000000010d03610]:
So I have a rough idea of what's going on, but why in my first example is there a copy of vv
(or what mistake have I made in interpretation), when vv
is already an array of floating points?