dplyr :conditional pipeline while grouping

Question

I have a data frame of hourly weather data. I want to group the hours into days and assign values to the new daily data frame when a condition is met and only the mean of the rows that meet that condition: In the daily data frame I don't want to use 0 values or non-zero values that do not meet the condition HumH >50 | WetH>30

My data looks like this (sorry I couldn't use a smaller subset): I would like to have

for day 1 --> only the mean(of the rows 12:18)

for day 2 --> 0 (since neither condition is met)

structure(list(Day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Temp = c(12, 12, 12, 
11, 10, 9, 9, 9, 9, 9, 8, 7, 6, 6, 6, 6, 6, 8, 11, 12, 13, 14, 
15, 16, 16, 16, 16, 14, 13, 11, 10, 9, 8, 7, 7, 6, 4, 3, 3, 2, 
4, 8, 11, 13, 14, 16, 17, 18), RH = c(69L, 68L, 71L, 73L, 76L, 
81L, 78L, 77L, 76L, 78L, 80L, 83L, 88L, 90L, 91L, 94L, 93L, 91L, 
78L, 67L, 62L, 56L, 49L, 39L, 36L, 34L, 34L, 36L, 38L, 42L, 48L, 
50L, 53L, 58L, 59L, 63L, 66L, 69L, 71L, 73L, 71L, 60L, 50L, 43L, 
41L, 36L, 34L, 31L), HumH = c(NA, 0, 0, 0, 0, 9, 0, 0, 0, 0, 
8, 15, 21, 27, 33, 39, 45, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0), 
    WetH = c(NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 12, 
    18, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-48L), class = "data.frame")

Then the 10th and 11thn days are the following

I would like to have

for day 10 --> only the mean(of the rows 9:16)

for day 11 --> 0 (since neither condition is met)

structure(list(Day = c(10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 
11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
11, 11, 11, 11, 11, 12), Temp = c(18, 17, 16, 14, 13, 12, 11, 
9, 8, 7, 7, 6, 6, 6, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 11, 10, 
9, 9, 9, 9, 9, 8, 8, 8, 8, 7, 7, 7, 8, 9, 10, 12, 13, 13, 14, 
15, 15), RH = c(36L, 39L, 42L, 48L, 55L, 65L, 69L, 76L, 81L, 
85L, 84L, 90L, 90L, 92L, 90L, 88L, 89L, 89L, 90L, 86L, 83L, 83L, 
81L, 78L, 75L, 76L, 79L, 87L, 89L, 87L, 87L, 78L, 72L, 70L, 70L, 
76L, 77L, 82L, 86L, 84L, 81L, 72L, 62L, 55L, 51L, 50L, 46L, 44L
), HumH = c(0, 0, 0, 0, 13, 0, 0, 0, 8, 15, 22, 28, 34, 40, 47, 
55, 63, 71, 79, 87, 96, 105, 114, 0, 0, 0, 0, 9, 18, 27, 36, 
0, 0, 0, 0, 0, 0, 7, 14, 22, 31, 0, 0, 0, 0, 0, 0, 0), WetH = c(0, 
0, 0, 0, 14, 0, 0, 0, 0, 0, 0, 0, 0, 6, 12, 0, 0, 8, 16, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0)), row.names = c(NA, -48L), class = "data.frame")

And my expected output is:

structure(list(Day = c(1, 2, 3, 4, 5, 6), Temp = c(6.2, 0, 0, 
0, 0, 0), RH = c(75.3333333333333, 49.8333333333333, 45.625, 
51.6666666666667, 52.75, 56.2083333333333)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame")

This is what I have done so far, but I cannot add the conditions:

weather_day <- weather_hour %>%
                  group_by(as.numeric(Day)) %>%
                  group_map(~head(.x,24L)) %>% 
                  summarize(Temp_day=mean(Temp),RH_day=mean(RH))}

I want the values of Temp and RH of the daily data frame to be for each group. Here is some pseudocode:

if(HumH <50 | WetH < 30){
    Temp= 0 & RH= 0
 }
else{
 Temp = mean(of consecutive non-zeros that lead to the condition) &
 RH = mean(of consecutive non-zeros that lead to the condition)
}

Your `dput()` is giving me an error – Vinícius Félix Nov 29 '22 at 12:48 — Vinícius Félix, Nov 29 '22 at 12:48
@ViníciusFélix it should works now – pantelis kleanth Nov 29 '22 at 13:42 — pantelis kleanth, Nov 29 '22 at 13:42

score 0 · Answer 1 · answered Nov 29 '22 at 18:20

I think I did not understand your rules, here is what I got:

weather_hour  <-
structure(list(Day = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
                       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 
                       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Temp = c(12, 12, 12, 
                                                                           11, 10, 9, 9, 9, 9, 9, 8, 7, 6, 6, 6, 6, 6, 8, 11, 12, 13, 14, 
                                                                           15, 16, 16, 16, 16, 14, 13, 11, 10, 9, 8, 7, 7, 6, 4, 3, 3, 2, 
                                                                           4, 8, 11, 13, 14, 16, 17, 18), RH = c(69L, 68L, 71L, 73L, 76L, 
                                                                                                                 81L, 78L, 77L, 76L, 78L, 80L, 83L, 88L, 90L, 91L, 94L, 93L, 91L, 
                                                                                                                 78L, 67L, 62L, 56L, 49L, 39L, 36L, 34L, 34L, 36L, 38L, 42L, 48L, 
                                                                                                                 50L, 53L, 58L, 59L, 63L, 66L, 69L, 71L, 73L, 71L, 60L, 50L, 43L, 
                                                                                                                 41L, 36L, 34L, 31L), HumH = c(NA, 0, 0, 0, 0, 9, 0, 0, 0, 0, 
                                                                                                                                               8, 15, 21, 27, 33, 39, 45, 53, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                                                                                                                                               0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0), 
               WetH = c(NA, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 12, 
                        18, 24, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
                                                                               -48L), class = "data.frame")


weather_day  <- 
  structure(list(Day = c(1, 2, 3, 4, 5, 6),
                 Temp = c(6.2, 0, 0, 0, 0, 0),
                 RH = c(75.3333333333333, 49.8333333333333, 45.625, 51.6666666666667, 52.75, 56.2083333333333)),
            row.names = c(NA,  -6L),
            class = c("tbl_df", "tbl", "data.frame")) 


weather_hour %>% 
  #Filtering values with HumH < 50 or WetH < 30
  filter(HumH < 50 | WetH < 30) %>%
  #Grouping by Day
  group_by(Day) %>%   
  summarise(
    RH = mean(RH,na.rm = TRUE),
    Temp = mean(Temp,na.rm = TRUE)
  ) %>% 
  #Comparing to your expected results
  left_join(weather_day ,by = "Day",suffix = c("_real","_prev")) 

# A tibble: 2 x 5
    Day RH_real Temp_real Temp_prev RH_prev
  <dbl>   <dbl>     <dbl>     <dbl>   <dbl>
1     1    75.6      9.74       6.2    75.3
2     2    49.8     10.2        0      49.8

Thank you. I will edit the post and include an extra data set. I hope that will clarify my problem — pantelis kleanth, Nov 29 '22 at 19:30

dplyr :conditional pipeline while grouping

1 Answers1