10

I am trying to do multiple steps in one line in R to select a value from a data.table (dt) with multiple criteria.

For example:

set.seed(123)
dt <- data.table(id = rep(letters[1:2],2),
             time = rnorm(4),
             value = rnorm(4)*100)

#    id        time      value
# 1:  a -0.56047565   12.92877
# 2:  b -0.23017749  171.50650
# 3:  a  1.55870831   46.09162
# 4:  b  0.07050839 -126.50612

# Now I want to select the last (maximum time) value from id == "a"
# My pseudo data.table code looks like this

dt[order(time) & id == "a" & .N, value]
# [1] 12.92877 46.09162  

Instead of getting the two values I want to get only the last value (which has the higher time-value).

If I do it step-by-step it works:

dt <- dt[order(time) & id == "a"]
dt[.N, value]
# [1] 46.09162

Bonus:

How can I order a data.table without copying the data.table: ie

dt <- dt[order(time)]

without the <-. Similar to the :=-operator such as in dt[, new_val := value*2] which creates the new variable without copying the whole data.table.

Thank you, any idea is greatly appreciated!

David
  • 9,216
  • 4
  • 45
  • 78
  • 1
    You say that `dt[order(time) & id == "a"]` works, but `order` gives an integer, not a logical and is simply coerced to `TRUE` so this is equivalent to `dt[TRUE & id=="a"]` or `dt[id=="a"]`. Your example is so small that it *looks like* it works here. – Frank Jun 02 '15 at 21:33

2 Answers2

10

For you first question, try

dt[id == "a", value[which.max(time)]]
## [1] 46.09162

For bonus question, try the setorder function which will order your data in place (you can also order in descending order by adding - in front of time)

setorder(dt, time)
dt
#    id        time      value
# 1:  a -0.56047565   12.92877
# 2:  b -0.23017749  171.50650
# 3:  b  0.07050839 -126.50612
# 4:  a  1.55870831   46.09162

Also, if you already ordering your data by time, you could do both - order by reference and select value by condition- in single line

setorder(dt, time)[id == "a", value[.N]]
Community
  • 1
  • 1
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
6

I know this is an older question, but I'd like to add something. Having a similar problem I stumbled on this question and although David Arenburg's answer does provide a solution to this exact question, I had trouble with it when trying to replace/overwrite values from that filtered and ordered data.table, so here is an alternative way which also lets you apply <- calls directly onto the filtered and ordered data.tabe.

The key is that data.table lets you concatenate several [] to each other.

Example:

dt[id=="a", ][order(time), ][length(value), "value"] <- 0

This also works for more than one entry, simply provide a suitable vector as replacement value.

Note however that the .N which is a list object needs to be replaced with e.g. length of the column because data.table expects an integer at this position in i and the column which you want to select in j needs to be wrapped by "".

I found this to be the more intuitive way and it lets you not only filter the data table, but also manipulate its values without needing to worry about temporary tables.

N. Maks
  • 539
  • 3
  • 15