While the commentary regarding RStudio references is probably true, it appears as though the book is outdated.
The last commit on the source code for that page was on 2019-06-25 - a date that predates the release of R v4.0.0.
If you check the change log for R, you will find the following change listed in v4.0.0:
Reference counting is now used instead of the NAMED mechanism for determining when objects can be safely mutated in base C code. This reduces the need for copying in some cases and should allow further optimizations in the future. It should help make the internal code easier to maintain.
R v3.6.3
Indeed, if you run the example code under R v3.6.3 (the version just prior to v4.0.0):
#> R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
#> Copyright (C) 2020 The R Foundation for Statistical Computing
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#>
#> R is free software and comes with ABSOLUTELY NO WARRANTY.
#> You are welcome to redistribute it under certain conditions.
#> Type 'license()' or 'licence()' for distribution details.
#>
#> Natural language support but running in an English locale
#>
#> R is a collaborative project with many contributors.
#> Type 'contributors()' for more information and
#> 'citation()' on how to cite R or R packages in publications.
#>
#> Type 'demo()' for some demos, 'help()' for on-line help, or
#> 'help.start()' for an HTML browser interface to help.
#> Type 'q()' to quit R.
x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))
for (i in seq_along(medians)) {
x[[i]] <- x[[i]] - medians[[i]]
}
cat(tracemem(x), "\n")
#> <000000002457F7D0>
for (i in 1:5) {
x[[i]] <- x[[i]] - medians[[i]]
}
#> tracemem[0x000000002457f7d0 -> 0x0000000024697c90]:
#> tracemem[0x0000000024697c90 -> 0x0000000024697c20]: [[<-.data.frame [[<-
#> tracemem[0x0000000024697c20 -> 0x0000000024697bb0]: [[<-.data.frame [[<-
#> tracemem[0x0000000024697bb0 -> 0x0000000024697b40]:
#> tracemem[0x0000000024697b40 -> 0x0000000024697ad0]: [[<-.data.frame [[<-
#> tracemem[0x0000000024697ad0 -> 0x0000000024697a60]: [[<-.data.frame [[<-
#> tracemem[0x0000000024697a60 -> 0x00000000246979f0]:
#> tracemem[0x00000000246979f0 -> 0x0000000024697980]: [[<-.data.frame [[<-
#> tracemem[0x0000000024697980 -> 0x0000000024697910]: [[<-.data.frame [[<-
#> tracemem[0x0000000024697910 -> 0x00000000246978a0]:
#> tracemem[0x00000000246978a0 -> 0x0000000024697830]: [[<-.data.frame [[<-
#> tracemem[0x0000000024697830 -> 0x00000000246977c0]: [[<-.data.frame [[<-
#> tracemem[0x00000000246977c0 -> 0x0000000024697750]:
#> tracemem[0x0000000024697750 -> 0x00000000246976e0]: [[<-.data.frame [[<-
#> tracemem[0x00000000246976e0 -> 0x0000000024697670]: [[<-.data.frame [[<-
untracemem(x)
y <- as.list(x)
cat(tracemem(y), "\n")
#> <0000000024697600>
for (i in 1:5) {
y[[i]] <- y[[i]] - medians[[i]]
}
#> tracemem[0x0000000024697600 -> 0x00000000247ec708]:
untracemem(y)
We observe the 15 copies being made for the dataframe and the one copy for the list as per the book.
R v4.0.0
However, if we run the same example code under R v4.0.0:
#> R version 4.0.0 (2020-04-24) -- "Arbor Day"
#> Copyright (C) 2020 The R Foundation for Statistical Computing
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#>
#> R is free software and comes with ABSOLUTELY NO WARRANTY.
#> You are welcome to redistribute it under certain conditions.
#> Type 'license()' or 'licence()' for distribution details.
#>
#> Natural language support but running in an English locale
#>
#> R is a collaborative project with many contributors.
#> Type 'contributors()' for more information and
#> 'citation()' on how to cite R or R packages in publications.
#>
#> Type 'demo()' for some demos, 'help()' for on-line help, or
#> 'help.start()' for an HTML browser interface to help.
#> Type 'q()' to quit R.
x <- data.frame(matrix(runif(5 * 1e4), ncol = 5))
medians <- vapply(x, median, numeric(1))
for (i in seq_along(medians)) {
x[[i]] <- x[[i]] - medians[[i]]
}
cat(tracemem(x), "\n")
#> <00000000236B0C50>
for (i in 1:5) {
x[[i]] <- x[[i]] - medians[[i]]
}
#> tracemem[0x00000000236b0c50 -> 0x00000000237a7a90]:
#> tracemem[0x00000000237a7a90 -> 0x00000000237a7a20]: [[<-.data.frame [[<-
#> tracemem[0x00000000237a7a20 -> 0x00000000237a79b0]:
#> tracemem[0x00000000237a79b0 -> 0x00000000237a7940]: [[<-.data.frame [[<-
#> tracemem[0x00000000237a7940 -> 0x00000000237a78d0]:
#> tracemem[0x00000000237a78d0 -> 0x00000000237a7860]: [[<-.data.frame [[<-
#> tracemem[0x00000000237a7860 -> 0x00000000237a77f0]:
#> tracemem[0x00000000237a77f0 -> 0x00000000237a7780]: [[<-.data.frame [[<-
#> tracemem[0x00000000237a7780 -> 0x00000000237a7710]:
#> tracemem[0x00000000237a7710 -> 0x00000000237a76a0]: [[<-.data.frame [[<-
untracemem(x)
y <- as.list(x)
cat(tracemem(y), "\n")
#> <00000000237A7630>
for (i in 1:5) {
y[[i]] <- y[[i]] - medians[[i]]
}
untracemem(y)
We observe the effects of the change in reducing the number of copies performed. The copies for the dataframe have gone from 15 to 10 and there is no copy performed for the list anymore.
To answer OP's question directly, the copy was being made unnecessarily per the NAMED mechanism. However, the change to reference counting in R v4.0.0 prevents the unnecessary copy, and the object is now modified in place as expected.