39

It seems that knitr doesn't understand that DT[, a:=1] should not result in an output of DT to the document. Is there a way to stop this behaviour?

Example knitr document:

Data.Table Markdown
========================================================
Suppose we make a `data.table` in **R Markdown**
```{r}
DT = data.table(a = rnorm(10))
```
Notice that it doesn't display the contents until we do a
```{r}
DT
```
style command.  However, if we want to use `:=` to create another column
```{r}
DT[, c:=5]
```
It would appear that the absence of a equals sign tricks `knitr` into thinking this 
is to be printed.

Knitr Output:

enter image description here

Is this a knitr bug or a data.table bug?

EDIT

I have only just noticed, that knitr is being weird when it is echoing the code. Look at the output above. In my source code I have DT[, c:=5] but what knitr renders is

DT[, `:=`(c, 5)]

Weird...

EDIT 2: Caching

Caching also seems to have a problem with := but that must be a different cause, so is a separate question here: why does knitr caching fail for data.table `:=`?

Community
  • 1
  • 1
Corvus
  • 7,548
  • 9
  • 42
  • 68
  • 1
    I find it hard to follow as to how this is a `data.table` bug. what happens if you assign it back to DT? (although I'm not sure if MatthewDowle would like it). – Arun Mar 07 '13 at 09:13
  • 1
    Good thought! `DT=DT[, :=]` is a viable workaround. I suspect it must be `knitr` but I don't know how `knitr` decides if it should output or not - is it purely the presence of assignment? `data.table` has clearly done something to stop output in the console - perhaps this is not comprehensive enough? – Corvus Mar 07 '13 at 09:28
  • 1
    If you wrap the expression in invisible() then knitr doesn't print it, which makes me think knitr is being cleverer than just not printing if there's an assignment. But then I think maybe data.table should return an invisible object in this case, hence a bug in data.table might be a possibility. Why doesn't it get printed anyway? That has to be data.table's business... – Spacedman Mar 07 '13 at 09:32
  • @Spacedman, for a `:=` (assign by reference), yes it doesn't. You'll have to either use `print(DT[, LHS := RHS])` or `DT[, LHS := RHS][]`. – Arun Mar 07 '13 at 09:40
  • @Corone, now I see the point as to why you think there might be something with `data.table`. – Arun Mar 07 '13 at 09:41
  • 2
    If I get the latest data.table from R-forge and load it with devtools I do see printed output with `DT[,c:=1]`. But only when loaded with devtools. The package does have some crufty-looking mucking with `invisible()` here and there and a `.global` object. Hmmm. – Spacedman Mar 07 '13 at 09:44

5 Answers5

20

Update Oct 2014. Now in data.table v1.9.5 :

:= no longer prints in knitr for consistency with behaviour at the prompt, #505. Output of a test knit("knitr.Rmd") is now in data.table's unit tests.

and related :

if (TRUE) DT[,LHS:=RHS] now doesn't print (thanks to Jureiss, #869). Test added. To get this to work we've had to live with one downside: if a := is used inside a function with no DT[] before the end of the function, then the next time DT is typed at the prompt, nothing will be printed. A repeated DT will print. To avoid this: include a DT[] after the last := in your function. If that is not possible (e.g., it's not a function you can change) then print(DT) and DT[] at the prompt are guaranteed to print. As before, adding an extra [] on the end of a := query is a recommended idiom to update and then print; e.g. > DT[,foo:=3L][]



Previous answer kept for posterity (the global$depthtrigger business is no longer done as from data.table v1.9.5 so this is no longer true) ...

Just to be clear I understand then: knitr is printing when you don't want it to.

Try increasing data.table:::.global$depthtrigger a little bit at the start of the script.

This will be 3 for you currently :

data.table:::.global$depthtrigger
[1] 3

I don't know how much eval depth knitr adds to the stack. But try changing the trigger to 4 first; i.e.

assign("depthtrigger", 4, data.table:::.global)

and at the end of the knitr script ensure to set it back to 3. If 4 doesn't work, try 5, then 6. If you get to 10 give up and I'll think again. ;-P

Why might this work?

See NEWS from v1.8.4 :

DT[,LHS:=RHS,...] no longer prints DT. This implements #2128 "Try again to get DT[i,j:=value] to return invisibly". Thanks to discussions here :
how to suppress output when using `:=` in R {data.table}, prior to v1.8.3?
http://r.789695.n4.nabble.com/Avoiding-print-when-using-tp4643076.html
FAQs 2.21 and 2.22 have been updated.

FAQ 2.21 Why does DT[i,col:=value] return the whole of DT? I expected either no visible value (consistent with <-), or a message or return value containing how many rows were updated. It isn't obvious that the data has indeed been updated by reference.
This has changed in v1.8.3 to meet your expectations. Please upgrade. The whole of DT is returned (now invisibly) so that compound syntax can work; e.g., DT[i,done:=TRUE][,sum(done)]. The number of rows updated is returned when verbosity is on, either on a per query basis or globally using options(datatable.verbose=TRUE).

FAQ 2.22 Ok, thanks. What was so difficult about the result of DT[i,col:=value] being returned invisibly?
R internally forces visibility on for [. The value of FunTab's eval column (see src/main/names.c) for [ is 0 meaning force R_Visible on (see R-Internals section 1.6). Therefore, when we tried invisible() or setting R_Visible to 0 directly ourselves, eval in src/main/eval.c would force it on again. To solve this problem, the key was to stop trying to stop the print method running after a :=. Instead, inside := we now (from v1.8.3) set a global flag which the print method uses to know whether to actually print or not.

That global flag is data.table:::.global$print. At the top of data.table:::print.data.table you'll see it looking at it. That's because there is no known way to suppress printing from [ (as FAQ 2.22 explains).

So, inside := inside [.data.table it looks to see how "deep" this call is :

if (Cstack_info()[["eval_depth"]] <= .global$depthtrigger) {
    suppPrint = function(x) { .global$print=FALSE; x }
    # Suppress print when returns ok not on error, bug #2376.
    # Thanks to: https://stackoverflow.com/a/13606880/403310
    # All appropriate returns following this point are
    # wrapped i.e. return(suppPrint(x)).
}

Essential that's just saying: if DT[,x:=y] is running at the prompt, then I know the REPL is going to call the print method on my result, beyond my control. Ok, so given print method is going to run, I'm going to suppress it inside that print method by setting a flag (since the print method that runs (i.e. print.data.table) is something I can control).

In knitr's case it's simulating the REPL in a clever way. It isn't really a script, iiuc, otherwise DT[,x:=y] wouldn't print anyway for that reason. But because it's simulating REPL via an eval there is an extra level of eval depth for code run from knitr. Or something similar (I don't know knitr).

Which is why I'm thinking increasing the depthtrigger might do the trick.

Hacky/crufty, I agree. But if it works, and you let me know which value works, I can change data.table to be knitr aware and change the depthtrigger automatically. Or any better solutions are most welcome.

Community
  • 1
  • 1
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • aargh! I just spent half an hour digging that out of the source code! Why is testing against C stack depth a good test for whether to invisibilise the result? – Spacedman Mar 07 '13 at 10:15
  • @Spacedman It's probably not _good_ per se. But it was the only way I found so far to know whether the result returned by this call to `[.data.table` is about to printed by the REPL. The internals of `invisible()` also set a global flag, btw. Basically, R ignores the invisibility of `[.class`'s result and prints it anyway (FAQ 2.22). – Matt Dowle Mar 07 '13 at 10:38
  • Just done this, and it works with the caveat that `depthtrigger = 53` is the smallest number that prevents printing! – Corvus Mar 07 '13 at 10:50
  • @Carone Crikey. Anybody know why `knitr` might add so many levels to the stack? – Matt Dowle Mar 07 '13 at 10:56
  • Is `knitr` byte compiled? If not, that might explain it. – Matt Dowle Mar 07 '13 at 11:03
  • 1
    Because when developing `data.table` (I just source() .R code into .GlobalEnv, essentially) I need `depthtrigger` to be 9. But when the package is installed, `depthtrigger` can be the much lower 3. I assume that's something to do with DESCRIPTION containing `ByteCompile: TRUE`. – Matt Dowle Mar 07 '13 at 11:20
  • Why do you care whether `DT[,x:=y]` is running at the prompt? Why not just always set your magic invisibility flag in these cases? Then if it is at the prompt, it's silent, and if it isn't, nothing bad happens anyway. – Spacedman Mar 07 '13 at 17:18
  • @Spacedman Because something bad does happen. If the flag is set, and the print method doesn't run, fine. But then the flag is still set. The _next_ thing the user types might be merely `DT`. That would then not print, and be confusing. Typing `DT` again a second time would then really print it. So I need to be careful to only set the flag when I know the REPL is going to call the print method on the result. If not I must not set the flag. – Matt Dowle Mar 07 '13 at 17:27
  • Ugh. Really the only decent thing to do is not to subvert the idea that assignment is the only way of changing an object in R :) – Spacedman Mar 07 '13 at 18:04
  • 8
    @Spacedman Sometimes you gotta break some eggs to make an omlette, even if a few bits of shell do fall in. – Matt Dowle Mar 07 '13 at 18:42
  • 4
    Update introduced in v1.9.5 , does not seem to work in v1.9.6 anymore (or 1.9.7 as of today). knitr still prints data.table after := assignment. – Jav Sep 26 '16 at 12:31
  • @JavK Thanks for reporting. It is tested though and that test is passing. Can you create a reproducible example? Perhaps it's different to the existing test. The news item in v1.9.6 was ":= no longer prints in knitr for consistency with behaviour at the prompt, #505. Output of a test knit("knitr.Rmd") is now in data.table's unit tests. Thanks to Corone for the illustrated report." – Matt Dowle Sep 26 '16 at 16:40
  • @MattDowle I get the same as JavK. See here: http://rpubs.com/ac_evu/230003 and http://pastebin.com/raw/rb37eXZS – Andreas Nov 24 '16 at 21:40
  • @MattDowle sorry for a very late reply. Please see here: http://rpubs.com/Keell/230348 – Jav Nov 26 '16 at 13:26
  • @JavaK Thanks. Here's the passing test in data.table. As far as I can see it covers yours. What's the difference? Your example does not seem to be reproducible in the sense that I can run it with `R --vanilla`. https://github.com/Rdatatable/data.table/blob/master/tests/knitr.Rmd – Matt Dowle Nov 28 '16 at 17:33
  • @JavaK And here is the .save output that `R CMD check` compares to : https://github.com/Rdatatable/data.table/blob/master/tests/knitr.Rout.save – Matt Dowle Nov 28 '16 at 17:35
  • Linking to [issue #1930](https://github.com/Rdatatable/data.table/issues/1930), thanks. – Matt Dowle Nov 28 '16 at 21:25
7

Why not just use:

```{r, results='hide'}
DT[, c:=5]
```
Matt Pollock
  • 1,063
  • 10
  • 26
  • 1
    Because it is likely not the only thing in the block. As work arounds go, self assignment is probably better. Finally what if your assignment calls a function (inplace of '5') which prints. – Corvus Aug 19 '14 at 20:20
5

For anyone returning to this in 2017 with RMarkdown 1.3 and data.table 1.10 or similar, there was a resurgence of this bug, as identified and documented here

This was subsequently fixed in RMarkdown 1.4

DaveRGP
  • 1,430
  • 15
  • 34
2

Just surround the expression with invisible(). This works for me.

guest
  • 29
  • 1
  • This is not ideal because if `echo = TRUE` and the output is to be used as a tutorial/reference, new learners might be led to believe it's normal/required to use `invisible` in regular coding (it's not). Unnecessary confusion. – MichaelChirico Dec 02 '16 at 19:03
1

I've run across the same problem and I solved it fairly easy by re-assigning the variable. In your case:

DT <- DT[, ':=' (c, 5)]

It's a bit more verbose though, especially if the variable name is big.

  • this is the smiplest method I've found. Especially if you don't want to update to the dev version (as the current cran seems to be v1.9.4) – DaveRGP Sep 10 '15 at 14:51