1

I have data that looks like this:

d <- data.table(SEQ = c(1, 1, 2, 2,
                        1, 2, 2, 3,
                        1, 1, 2, 3,
                        3, 4, 5, 5),
                TYPE = c("A", "A", "B", "B",
                         "C", "C", "C", "C",
                         "D", "D", "D", "D",
                         "D", "D", "D", "D"), 
                CLASS = c(1.1, 2.1, 3.3, 4.5,
                          1.3, 2.4, 3.4, 4.6,
                          1.4, 2.5, 3.6, 4.0,
                          1.4, 2.7, 3.2, 4.5))

That looks like this:

data

I want to filter this down into a new table, call it b.
The logic for filtering would be for each unique(TYPE) only keep the last row for each unique(SEQ). I know I can get this done with some for loops but I would like to avoid that.

The expected output for the sample data would be:

expected output

If anyone has any suggestions of efficient ways to filter, I am all ears. Thanks!

Bear
  • 662
  • 1
  • 5
  • 20

2 Answers2

2

You want to group by using both groups, and then use .N to select the last line within each subset of data, ie .SD :

R> d[, .SD[.N], by=.(SEQ,TYPE)]
    SEQ TYPE CLASS
 1:   1    A   2.1
 2:   2    B   4.5
 3:   1    C   1.3
 4:   2    C   3.4
 5:   3    C   4.6
 6:   1    D   2.5
 7:   2    D   3.6
 8:   3    D   1.4
 9:   4    D   2.7
10:   5    D   4.5
R> 
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
1

You could also try:

library(dplyr)
d %>% group_by(TYPE, SSEQ) %>% filter(row_number() == max(row_number()))
#     SEQ TYPE  CLASS
# 1     1 A       2.1
# 2     2 B       4.5
# 3     1 C       1.3
# 4     2 C       3.4
# 5     3 C       4.6
# 6     1 D       2.5
# 7     2 D       3.6
# 8     3 D       1.4
# 9     4 D       2.7
#10     5 D       4.5
AndS.
  • 7,748
  • 2
  • 12
  • 17