2

I have a data.table called client, that has the following columns:

    "location"   "clientID"   "gender"     "age"  "startYear"  "ag1"        "ag2"      

There is the age variable and two factorised versions of it: ag1 and ag2.

Now i want to group my data, using the ag1 factorization:

agegroup <- sym(agegroup)
clientTotal <- client[,.(NOinsureds=length(clientID)), by = .(gender, agegroup, startYear)]

which results in an error:

Show in New WindowClear OutputExpand/Collapse Output Error in [.data.table(clientData, , .(NOinsureds = length(clientID)), : column or expression 2 of 'by' or 'keyby' is type symbol. Do not quote column names. Usage: DT[,sum(colC),by=list(colA,month(colB))]

I assume this is because the NSE. I have read a little about it online, which is why i included agegroup <- sym(agegroup), but it does not seem to work.

Any suggestions?

I will be manipulating the data quite a lot, and i will also be writing functions, so maybe keep that in mind when recommending solutions :)

linog
  • 5,786
  • 3
  • 14
  • 28
Nneka
  • 1,764
  • 2
  • 15
  • 39

2 Answers2

0

Yes, your problem seems related to NSE. Rather than using .(), you can use character vectors names in by.

agegroup <- "ag1"
clientTotal <- client[,.(NOinsureds=length(clientID)),
                        by = c("gender", agegroup, "startYear")]

data.table syntax is very nice in function because `NSÈ is not hard to apply. If it can help you, I wrote a post on the subject of NSE in data.table. You can also find many resources online.

linog
  • 5,786
  • 3
  • 14
  • 28
  • the point, is that i do not want to use character vectors because my input will be changing (as part of a function). Is i want to have my data grouped by the ag2 i will just change `agegroup <- "ag2"`, then `agegroup <- sym(agegroup)` and use `agegroup` in `data.table`... does it make sense? – Nneka Apr 13 '20 at 16:18
  • I don't understand why you need `sym`. Could you show an example of what you want to do ? In my example (just edited), you can use agegroup <- "ag1" or agegroup <- "ag2" – linog Apr 13 '20 at 16:21
  • If you need to use your variable in `i` or `j` dimension of the `[...]`, an example would help – linog Apr 13 '20 at 16:22
  • 1
    your post is great btw, very helpful – Nneka Apr 13 '20 at 18:48
  • Thanks, happy it can help others ! – linog Apr 13 '20 at 18:59
0

Based on your clarifications in the comments, you should be able to use get().

library(data.table)
agegroup <- 'ag1'
clientTotal <- client[,.(NOinsureds=length(clientID)), by = .(gender, age = get(agegroup), startYear)]

Consider this example:

library(data.table)
cars <- data.table(mtcars)
strBy <- "carb"
cars[,.(avg.mpg = mean(mpg)), by = .(gear,Carborator = get(strBy))]
    gear Carborator  avg.mpg
 1:    4          4 19.75000
 2:    4          1 29.10000
 3:    3          1 20.33333
 4:    3          2 17.15000
 5:    3          4 12.62000
 6:    4          2 24.75000
 7:    3          3 16.30000
 8:    5          2 28.20000
 9:    5          4 15.80000
10:    5          6 19.70000
11:    5          8 15.00000
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57