1

I have a data looks like below, I would like to skip 2 rows after max index of certain types (3 and 4). For example, I have two 4s in my table, but I only need to remove 2 rows after the second 4. Same for 3, I only need to remove 2 rows after the third 3.

-----------------
|  grade | type |
-----------------
|   93   |   2  |
-----------------
|   90   |   2  |
-----------------
|   54   |   2  |
-----------------
|   36   |   4  |
-----------------
|   31   |   4  |
-----------------
|   94   |   1  |
-----------------
|   57   |   1  |
-----------------
|   16   |   3  |
-----------------
|   11   |   3  |
-----------------
|   12   |   3  |
-----------------
|   99   |   1  |
-----------------
|   99   |   1  |
-----------------
|    9   |   3  |
-----------------
|   10   |   3  |
-----------------
|   97   |   1  |
-----------------
|   96   |   1  |
-----------------

The desired output would be:

-----------------
|  grade | type |
-----------------
|   93   |   2  |
-----------------
|   90   |   2  |
-----------------
|   54   |   2  |
-----------------
|   36   |   4  |
-----------------
|   31   |   4  |
-----------------
|   16   |   3  |
-----------------
|   11   |   3  |
-----------------
|   12   |   3  |
-----------------
|    9   |   3  |
-----------------
|   10   |   3  |
-----------------

Here is the code of my example:

data <- data.frame(grade = c(93,90,54,36,31,94,57,16,11,12,99,99,9,10,97,96), type = c(2,2,2,4,4,1,1,3,3,3,1,1,3,3,1,1))

Could anyone give me some hints on how to approach this in R? Thanks a bunch in advance for your help and your time!

monckeyyL
  • 105
  • 2
  • 9
  • 1
    Can you turn your example into code so I can run some tests? – Juan C Nov 02 '22 at 22:20
  • `data.frame(grade = c(93, 90, 54, 36, 31, 94, 57, 16, 11, 12, 99, 99), type = c(2, 2, 2, 4, 4, 1, 1, 3, 3, 3, 1, 1))` would be helpful to have in the OP – Jon Spring Nov 02 '22 at 22:21
  • data[-c(max(which(data$type==3))+1:2,max(which(data$type==4))+1:2),] – Ric Nov 02 '22 at 22:30

3 Answers3

2
data[-c(max(which(data$type==3))+1:2,max(which(data$type==4))+1:2),]

#    grade type
# 1     93    2
# 2     90    2
# 3     54    2
# 4     36    4
# 5     31    4
# 8     16    3
# 9     11    3
# 10    12    3
Ric
  • 5,362
  • 1
  • 10
  • 23
  • Thanks a bunch Ric! I just ran some tests on my end and it works fine when a series of 3s appears once, but when more than one series of 3s appear it seems to only keep the maximum one. Is there a I can keep both? For example, use this data df <- data.frame(grade = c(93,90,54,36,31,94,57,16,11,12,99,99,45,53,66,43), type = c(2,2,2,4,4,1,1,3,3,3,1,1,3,3,1,1)), my desired output would be df <- data.frame(grade = c(93,90,54,36,31,16,11,12,45,53), type = c(2,2,2,4,4,3,3,3,3,3)). – monckeyyL Nov 02 '22 at 23:43
  • That is a sufficentty different question enough to deserve a new post. I only will say that grouping consecutive values is done by `rle` or `rleid` in some answers like [this](https://stackoverflow.com/questions/27077228/create-counter-of-consecutive-runs-of-a-certain-value/27077300#27077300) however you problem is tougher, – Ric Nov 03 '22 at 00:31
1

Using some indexing:

data[-(nrow(data) - match(c(3,4), rev(data$type)) + 1 + rep(1:2, each=2)),]
#   grade type
#1     93    2
#2     90    2
#3     54    2
#4     36    4
#5     31    4
#8     16    3
#9     11    3
#10    12    3

Or more generically:

vals <- c(3,4)
data[-(nrow(data) - match(vals, rev(data$type)) + 1 + rep(1:2, each=length(vals))),]

The logic is to match the first instance of each value to the reversed values in the column, then spin that around to give the original row index, then add 1 and 2 to the row indexes, then drop these rows.

thelatemail
  • 91,185
  • 12
  • 128
  • 188
0

Similar to Ric, but I find it a bit easier to read (way more verbose, though):

idx = data %>% mutate(id = row_number()) %>%
filter(type %in% 3:4) %>% group_by(type) %>% filter(id == max(id)) %>% pull(id)
data[-c(idx + 1, idx + 2),]
Juan C
  • 5,846
  • 2
  • 17
  • 51