28

Is there a way to prevent data.table to print the new data.table after assigning a new column by reference? I gather standard behaviour is

library(data.table)
example(data.table)
DT
#    x y  v
# 1: a 1 42
# 2: a 3 42
# 3: a 6 42
# 4: b 1 11
# 5: b 3 11
# 6: b 6 11
# 7: c 1  7
# 8: c 3  8
# 9: c 6  9

DT[,z:=1:nrow(DT)]

#    x y  v z
# 1: a 1 42 1
# 2: a 3 42 2
# 3: a 6 42 3
# 4: b 1 11 4
# 5: b 3 11 5
# 6: b 6 11 6
# 7: c 1  7 7
# 8: c 3  8 8
# 9: c 6  9 9

i.e. the table is printed to screen after assignment. is there a way to stop data.table from showing the new table after assigning the new column z? I know I can stop this behaviour by saying

DT <- copy(DT[,z:=1:nrow(DT)])

but that is defeating the purpose of := (which is designed to avoid copies).

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Florian Oswald
  • 5,054
  • 5
  • 30
  • 38

2 Answers2

31

Since <-.data.table doesn't make a copy, you can use <-:

Create a data.table object:

library(data.table)
di <- data.table(iris)

Create a new column:

di <- di[, z:=1:nrow(di)]
di

#       Sepal.Length Sepal.Width Petal.Length Petal.Width Species  z
#  [1,]          5.1         3.5          1.4         0.2  setosa  1
#  [2,]          4.9         3.0          1.4         0.2  setosa  2
#  [3,]          4.7         3.2          1.3         0.2  setosa  3
#  [4,]          4.6         3.1          1.5         0.2  setosa  4
#  [5,]          5.0         3.6          1.4         0.2  setosa  5
#  [6,]          5.4         3.9          1.7         0.4  setosa  6
#  [7,]          4.6         3.4          1.4         0.3  setosa  7
#  [8,]          5.0         3.4          1.5         0.2  setosa  8
#  [9,]          4.4         2.9          1.4         0.2  setosa  9
# [10,]          4.9         3.1          1.5         0.1  setosa 10
# First 10 rows of 150 printed. 

It is also worth remembering that R only prints the value of an object in interactive mode.

So, in batch mode, you can simply use:

di[, z:=1:nrow(di)]

This will not produce any output when run as a script in batch mode.


Further info from Matthew Dowle:

Also see FAQ 2.21 and 2.22 :

2.21 Why does DT[i,col:=value] return the whole of DT? I expected either no visible value (consistent with <-), or a message or return value containing how many rows were updated. It isn't obvious that the data has indeed been updated by reference.

So that compound syntax can work; e.g., DT[i,done:=TRUE][,sum(done)]. The number of rows updated is returned when verbosity is on, either on a per query basis or globally using options(datatable.verbose=TRUE).

2.22 Ok, but can't the return value of DT[i,col:=value] be returned invisibly, then?

  • We tried to but R internally forces visibility on for [. The value of FunTab's eval column (see src/main/names.c) for [ is 0 meaning force R_Visible on (see R-Internals section 1.6). Therefore, when we tried invisible() or setting R_Visible to 0 directly ourselves, eval in src/main/eval.c would force it on again.
  • After getting used to this behaviour, you might grow to prefer it (we have). After all, how many times do we subassign using <- and then immediately look at the data to check it's ok?
  • We can mix := into a j which also returns data; a mixed update and select in one query. To detect whether j solely updates (and then behave dierently) could be confusing.

Second update from Matthew Dowle:

We have now found a solution and v1.8.3 no longer prints the result when := is used. We will update FAQ 2.21 and 2.22.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • Thanks! I'll use `<-` in that case. – Florian Oswald Jul 06 '12 at 09:44
  • 3
    +10 I didn't think to try `DT<-DT[i,col:=value]`. The repetition of the variable name isn't great, but works. That gives me an idea how to get `DT[i,col:=value]` to return invisibly: now raised as [FR#2128](https://r-forge.r-project.org/tracker/index.php?func=detail&aid=2128&group_id=240&atid=978), thanks! – Matt Dowle Jul 06 '12 at 10:32
  • +1 for the mighty clever hack + interesting snippet from the FAQ. – Josh O'Brien Jul 06 '12 at 12:58
  • @MatthewDowle Thank you for the edit as well as the fix in v1.8.3 – Andrie Sep 30 '12 at 16:29
  • 1
    Is there the potential for a hack to handle cases where the assignment takes place inside of a bracketed expression (e.g. a loop or if statement)? It seems to still happen in 1.8.8 on R 3.0.0. – russellpierce Sep 25 '13 at 20:22
  • 2
    I noticed that `:=` produces output when Knit HTML is used in RStudio. This is unexpected. Why does knitr produce output in the resulting HTML file when none sent to the console? Is there a way to suppress output in this case? Is this a new question? – Brian D Feb 23 '15 at 00:59
  • @MattDowle I find that data.tables some times eval invisibly and estimate that the changes introduced here are responsible. Example: f<-function () { d<-data.table(l=letters,n=1:26) d[l=='a',n:=100] d }; x<-f(); x; x; the first x does not print, only the second. – malcook Nov 13 '17 at 20:50
0

For a very long data table name, it seems that the following is equivalent in performance and can be shorter (I prefer short names but sometimes need a longer name to remember what an object really contains):

invisible(Very.Long.Data.Table[i,col:=value])

compare with:

Very.Long.Data.Table<-Very.Long.Data.Table[i,col:=value]

jimbod119
  • 2,871
  • 1
  • 6
  • 11