0

I have a tibble:

> ipTraceSummary2
# A tibble: 11,839 × 6
   DstPo Protocol IntervalStart     n intervalLength  rate
   <int> <chr>    <chr>         <int>          <dbl> <dbl>
 1    23 TCP      15:31:00          7           10   0.7  
 2    23 TCP      15:31:10          4           10   0.4  
 3    23 TCP      15:31:20          5           10   0.5  
 4    23 TCP      15:31:30          3           10   0.3  
 5    23 TCP      15:31:40          5           10   0.5  
 6    23 TCP      15:31:50          4           10   0.4  
 7    23 TCP      Post             26           39.5 0.658
 8    23 TCP      Pre              37           60.0 0.617
 9    23 TELNET   15:31:00          6           10   0.6  
10    23 TELNET   15:31:10          4           10   0.4  
# … with 11,829 more rows
> 

for which I would like to create a new column containing the rate in the IntervalStart=="Pre" row for the matching DstPo and Protocol.

It would seem like this should be something like

ipTraceSummary2 %>%
  mutate( baseline = pull((ipTraceSummary2 %>% filter(`DstPo` == DstPo, `Protocol` == Protocol, `IntervalStart` == "Pre"))[1,"rate"]) )

but that populates baseline with the first Pre rate encountered, not matching on DstPo and Protocol. How can I get filter() to pick up the DstPo and Protocol from the row mutate is currently processing?

Ian
  • 1,507
  • 3
  • 21
  • 36
  • 1
    `group_by()` and `first()` are your friends. – Limey Jun 17 '22 at 08:59
  • Does this answer your question? [Assign max value of group to all rows in that group](https://stackoverflow.com/questions/56315103/assign-max-value-of-group-to-all-rows-in-that-group) – Limey Jun 17 '22 at 08:59
  • Think about creating a new table and then joining it to `ipTraceSummary2` if you want to add summary values at specific corresponding rows. – danlooo Jun 17 '22 at 09:01
  • @Limey Thanks for that. Seems close, but I still can't figure out what the rhs of the mutate() needs to be to pick up that other row within the current group. – Ian Jun 17 '22 at 09:15
  • This is because your desired result is not tidy aka 3NF normalized. One use `mutate` only to calculate a new value for every row using values of other columns, correspondingly – danlooo Jun 17 '22 at 09:53

1 Answers1

1

How about this?

Depending on where the "Pre" line appears in each combination of "DstPro" and "Protocol", you may need to change the fill() .direction`?

library(dplyr)

ipTraceSummary2 %>% 
  group_by(DstPo, Protocol) %>% 
  mutate(baseline = ifelse(IntervalStart == "Pre", DstPo, NA)) %>% 
  tidyr::fill(baseline, .direction = "updown")
Tech Commodities
  • 1,884
  • 6
  • 13