5

from my simple data.table, for example, like this:

dt1 <- fread("
col1 col2 col3
AAA  ab   cd
BBB  ef   gh
BBB  ij   kl
CCC  mn   nm")

I am making new table, for example, like this:

dt1[,
    .(col3, new=.N),
    by=col1]

>   col1 col3 new
>1:  AAA   cd   1
>2:  BBB   gh   2
>3:  BBB   kl   2
>4:  CCC   op   1

this works fine when I indicate column names explicitly. But when I have them in the variables and try to use with=F, this gives an error:

colBy   <- 'col1'
colShow <- 'col3' 

dt1[,
    .(colShow, 'new'=.N),
    by=colBy,
    with=F] 
# Error in `[.data.table`(dt1, , .(colShow, new = .N), by = colBy, with = F) :   object 'ansvals' not found

I could not find any information about this error so far.

Vasily A
  • 8,256
  • 10
  • 42
  • 76

2 Answers2

7

The reason why you are getting this error message is that when using with=FALSE you tell data.table to treat j as if it were a dataframe. It therefore expects a vector of columnnames and not an expression to be evaluated in j as new=.N.

From the documentation of ?data.table about with:

By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE j is a character vector of column names or a numeric vector of column positions to select, and the value returned is always a data.table.

When you use with=FALSE, you have to select the columnnames in j without a . before () like this: dt1[, (colShow), with=FALSE]. Other options are dt1[, c(colShow), with=FALSE] or dt1[, colShow, with=FALSE]. The same result can be obtained by using dt1[, .(col3)]

To sum up: with = FALSE is used to select columns the data.frame way. So, you should do it then as such.

Also by using by = colBy you are telling data.table to evaluate j which is in contradiction with with = FALSE.

From the documentation of ?data.table about j:

A single column name, single expresson of column names, list() of expressions of column names, an expression or function call that evaluates to list (including data.frame and data.table which are lists, too), or (when with=FALSE) a vector of names or positions to select.

j is evaluated within the frame of the data.table; i.e., it sees column names as if they are variables. Use j=list(...) to return multiple columns and/or expressions of columns. A single column or single expression returns that type, usually a vector. See the examples.

See also points 1.d and 1.g of the introduction vignette of data.table.


ansvals is a name used in data.table internals. You can see where it appears in the code by using ctrl+f (Windows) or cmd+f (macOS) here.

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • thank you for explanation! Does it actually mean that there's just no way to use `by=` when column names are stored in variables? – Vasily A Nov 22 '15 at 19:16
  • @VasilyA That's certainly possible, but you have to do it in the correct way. See [here](http://stackoverflow.com/questions/32940580/convert-some-column-classes-in-data-table/32942319#32942319) or [here](http://stackoverflow.com/questions/33772830/how-to-set-multiple-columns-and-selected-rows-in-data-table-to-value-from-other/33774525#33774525) for examples. You might also want to read the [getting started guide](https://github.com/Rdatatable/data.table/wiki/Getting-started) – Jaap Nov 22 '15 at 19:41
  • well, in those examples `by=` is not used at all, this makes it quite different... I will read again the Getting started guide and maybe post a separate question specifying what exactly I need. – Vasily A Nov 22 '15 at 20:33
1

The error object 'ansvals' not found looks like a bug to me. It should either be a helpful message or just work. I've filed issue #1440 linking back to this question, thank you.

Jaap is completely correct. Following on from his answer, you can use get() in j like this :

dt1
#   col1 col2 col3
#1:  AAA   ab   cd
#2:  BBB   ef   gh
#3:  BBB   ij   kl
#4:  CCC   mn   nm
colBy
#[1] "col1"
colShow
#[1] "col3"
dt1[,.(get(colShow),.N),by=colBy]
#   col1 V1 N
#1:  AAA cd 1
#2:  BBB gh 2
#3:  BBB kl 2
#4:  CCC nm 1
Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • 1
    thank you Matt! I was just going to post a question asking if `get()` could be a good solution :) – Vasily A Nov 23 '15 at 10:34