I am trying to find out how to efficiently output minimum values of runtime_sec
based on of a subset from hour
column potentially using an anonymous function. Currently, I have a long way of creating a new dataframe, then joining it to the existing dataframe, but would like to do this more efficiently, without having to save out to a new dataframe. I've been looking at how to do this in map (purrr) functions but having a bit of trouble understanding. Apologies in advance if this is confusing, this is my first post on here.
Existing df:
| index | hour | runtime_sec |
|-----: |-----:| -----------:|
| 1 | 6 | 50 |
| 1 | 7 | 100 |
| 1 | 8 | 120 |
| 1 | 9 | 90 |
| 1 | 10 | 100 |
| 1 | 11 | 100 |
| 2 | 10 | 100 |
Current code:
df_min <- df %>%
group_by(index) %>%
filter(hour >= 8 & hour < 10) %>%
summarize(min_ref = min(runtime_sec))
df_join <- df %>%
left_join(df_min, by = "index")
Desired output:
| index | hour | runtime_sec | min_ref |
|----: |----: | ----: | ----: |
| 1 | 6 | 50 | 90 |
| 1 | 7 | 100 | 90 |
| 1 | 8 | 120 | 90 |
| 1 | 9 | 90 | 90 |
| 1 | 10 | 100 | 90 |
| 1 | 11 | 100 | 90 |
| 2 | 10 | 100 | 100 |