7

I want to change factor levels of a column using setattr. However, when the column is selected the standard data.table way (dt[ , col]), the levels are not updated. On the other hand, when selecting the column in an unorthodox way in a data.table setting—namely using $—it works.

library(data.table)

# Some data 
d <- data.table(x = factor(c("b", "a", "a", "b")), y = 1:4)
d
#    x y
# 1: b 1
# 2: a 2
# 3: a 3
# 4: b 4

# We want to change levels of 'x' using setattr
# New desired levels
lev <- c("a_new", "b_new")

# Select column in the standard data.table way 
setattr(x = d[ , x], name = "levels", value = lev)

# Levels are not updated
d
#    x y
# 1: b 1
# 2: a 2
# 3: a 3
# 4: b 4

# Select column in a non-standard data.table way using $
setattr(x = d$x, name = "levels", value = lev)

# Levels are updated
d
#        x y
# 1: b_new 1
# 2: a_new 2
# 3: a_new 3
# 4: b_new 4

# Just check if d[ , x] really is the same as d$x
d <- data.table(x = factor(c("b", "a", "a", "b")), y = 1:4)
identical(d[ , x], d$x)
# [1] TRUE
# Yes, it seems so

It feels like I'm missing some data.table (R?) basics here. Can anyone explain what's going on?


I have found two other post on setattr and levels:

setattr on levels preserving unwanted duplicates (R data.table)

How does one change the levels of a factor column in a data.table

Both of them used $ to select the column. Neither of them mentioned the [ , col] way.

Community
  • 1
  • 1
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • According to docs, there shouldn't be a problem - *‘setattr’ accepts any input; e.g, list, columns of a ‘data.frame’ or ‘data.table’.* I'm getting the same issue as you though. – thelatemail Jan 09 '17 at 00:04
  • 1
    This behavior makes sense to me. Fwiw, I always just do the setattr inside j, which seems to work fine here: `d[, setattr(x, "levels", lev)]` – Frank Jan 09 '17 at 02:43
  • 1
    @Frank Thank you for your feedback. I was so tired yesterday so I only managed to press the up-arrow before I fell asleep. Dreamt sweet dreams (or was it nightmares?) after the semi-traumatic `j` experience. At least I was right in my "_It feels like I'm missing some `data.table` (`R`?) basics here_" ;) – Henrik Jan 09 '17 at 12:51

1 Answers1

8

It might help to understand if you look at the address from both expressions:

address(d$x)
# [1] "0x10e4ac4d8"
address(d$x)
# [1] "0x10e4ac4d8"


address(d[,x])
# [1] "0x105e0b520"
address(d[,x])
# [1] "0x105e0a600"

Note that the address from the first expression doesn't change when you call it multiple times, while the second expression does which indicates it is making a copy of the column due to the dynamic nature of the address, so setattr on it will have no effect on the original data.table.

Psidom
  • 209,562
  • 33
  • 339
  • 356
  • @Henrik - from my naïve perspective, it seems to contradict the docs. At a minimum, I think it should be noted. – thelatemail Jan 09 '17 at 00:21
  • I think it's reasonable considering what you usually do at the `j` position, i.e. summarizing, creating new columns and new lists unless you are using `:=`. But just a guess. – Psidom Jan 09 '17 at 00:26
  • @Psidom Yeah, that makes sense. Thanks. In my ignorance, I thought I was able to do 'only' a simple select of the column, in the `$` / subsetting sense of select, but doing it in `j` is of course more of a calculation style select. Indeed, I _was_ lacking some `data.table` basics. – Henrik Jan 09 '17 at 07:34
  • 1
    @Henrik if you''ll check out the source code of `data.table:::\`[.data.table\`` you''ll see how massive it is. This is exactly why the `set` function was invented IIRC. And the standard way to select a *single* column in `data.table` will be always using either `$` or `[[` which are both Primitives compared that mess above. – David Arenburg Jan 09 '17 at 07:39