0

Here is my example data frame with the expected output.

data.frame(index=c("3435pear","3435grape","3435apple","3435avocado","3435orange","3435kiwi","3436grapefruit","3436apple","3436banana","3436grape","3437apple","3437grape","3437avocado","3437orange","3438apple","3439apple","3440apple"),output=c("na","na","na","na","na","na","na","na","na","na","na","na","na","na","3435apple","3436apple","3437apple"))

                index    output
1        3435pear        na
2       3435grape        na
3       3435apple        na
4     3435avocado        na
5      3435orange        na
6        3435kiwi        na
7  3436grapefruit        na
8       3436apple        na
9      3436banana        na
10      3436grape        na
11      3437apple        na
12      3437grape        na
13    3437avocado        na
14     3437orange        na
15      3438apple 3435apple
16      3439apple 3436apple
17      3440apple 3437apple

I want to match the fruit that is third from the bottom as I go down the column. If there are not three previous fruits it should return NA. Once the 4th apple appears it matches the apple 3 before it, then the 5th apple appears it matches the one 3 before that one, and so on.

I was trying to use rollapply, match, and tail to make this work, but I don't know how to reference the current row for the matching. In excel I would use the large, if, and row functions to do this. Excel makes my computer grind for hours to calculate everything and I know R could do this in minutes(seconds?).

Eric
  • 101
  • 7

1 Answers1

0

You can do this:

library(dplyr) 

df %>% 
  mutate(fruit = gsub("[0-9]", "", index)) %>% 
  group_by(fruit) %>% 
  mutate(new_output = lag(index, 3)) %>% 
  select(-fruit) %>%
  ungroup

By each group of fruit, your new_output gives you the index value lagged by 3. I preserved the output column and saved my results in new_output so that you can compare.

Hong
  • 574
  • 3
  • 10
  • Wow it works! I'm not sure why it works, but it does. How did you learn to use R so well? Is it just from practice? – Eric Mar 29 '21 at 02:41
  • What this code is doing is (1) build groups by 'fruit', which you get by eliminating all numbers from your `index` column, then (2) within each `group`, create a new column called `new_output` and assign the value of `index` that is three rows up. Since this is all done within the group, for example the rows in `grape` group (which only has 3 rows in total) would be assigned `NA` values. Anyhow, glad it helps. – Hong Mar 29 '21 at 02:46