Filter data for same date but keep only value with high frequency

Question

Date_Time              wind_cardinal_direction_set_1d weather_condition_set_1d     n
   <dttm>              <chr>                          <chr>                    <int>
 1 2015-01-01 01:00:00 N                              Fog                          1
 2 2015-01-01 01:00:00 N                              Mist                         2
 3 2015-01-01 02:00:00 N                              Fog                          2
 4 2015-01-01 02:00:00 N                              Mist                         1
 5 2015-01-01 03:00:00 N                              Fog                          3
 6 2015-01-01 04:00:00 N                              Mist                         3
 7 2015-01-01 05:00:00 N                              Mist                         3
 8 2015-01-01 06:00:00 N                              Mist                         3
 9 2015-01-01 07:00:00 N                              Fog                          2
10 2015-01-01 07:00:00 N                              Mist                         1
# ... with 6,798 more rows
>

For each date-time combination, I want to keep the one with the max value of n

df_cat %>%  filter(df_cat$n>df_cat$n,)

An example of expected output / result might improve this question. — Kjartan, Sep 12 '19 at 13:08

s__ · Accepted Answer · 2019-09-12T13:36:42.730

Welcome to SO! Seems you like dplyr, so here a solution:

library(dplyr)
df_cat %>%   
  group_by(Date_Time) %>%    # group by date
  summarise(n = max(n)) %>%  # get the max values
  left_join(df_cat) %>%      # fetch the other columns
  # order them
  select(Date_Time,wind_cardinal_direction_set_1d,weather_condition_set_1d, n)
Joining, by = c("Date_Time", "n")
# A tibble: 7 x 4
  Date_Time           wind_cardinal_direction_set_1d weather_condition_set_1d     n
  <fct>               <fct>                          <fct>                    <int>
1 2015-01-01 01:00:00 N                              Mist                         2
2 2015-01-01 02:00:00 N                              Fog                          2
3 2015-01-01 03:00:00 N                              Fog                          3
4 2015-01-01 04:00:00 N                              Mist                         3
5 2015-01-01 05:00:00 N                              Mist                         3
6 2015-01-01 06:00:00 N                              Mist                         3
7 2015-01-01 07:00:00 N                              Fog                          2

Or you can make it in this way thanks to Ronak Shah:

df_cat %>% group_by(Date_Time) %>% slice(which.max(n))

With data:

df_cat <- read.table(text ='Date_Time              wind_cardinal_direction_set_1d weather_condition_set_1d     n
 1 "2015-01-01 01:00:00" N                              Fog                          1
 2 "2015-01-01 01:00:00" N                              Mist                         2
 3 "2015-01-01 02:00:00" N                              Fog                          2
 4 "2015-01-01 02:00:00" N                              Mist                         1
 5 "2015-01-01 03:00:00" N                              Fog                          3
 6 "2015-01-01 04:00:00" N                              Mist                         3
 7 "2015-01-01 05:00:00" N                              Mist                         3
 8 "2015-01-01 06:00:00" N                              Mist                         3
 9 "2015-01-01 07:00:00" N                              Fog                          2
10 "2015-01-01 07:00:00" N                              Mist                         1', header = T)

Why do you need a join. you can select `max` value by group ? `df_cat %>% group_by(Date_Time) %>% slice(which.max(n))` — Ronak Shah, Sep 12 '19 at 13:34
Because I did not know that function, I really like SO for this! Thanks @RonakShah — s__, Sep 12 '19 at 13:37

Filter data for same date but keep only value with high frequency

1 Answers1