0

I am trying to summarise a dataframe based on grouping by label column. I want to obtain means based on the following conditions: - if all numbers are NA - then I want to return NA - if mean of all the numbers is 1 or lower - I want to return 1 - if mean of all the numbers is higher than 1 - I want a mean of the values in the group that are greater than 1 - all the rest should be 100.

Managed to find the answer and now my code is running well - is.na() should be there instead of ==NA in the first ifelse() statement and that was the issue.

label <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7)
sev <- c(NA,NA,NA,NA,1,0,1,1,1,NA,1,2,2,4,5,1,0,1,1,4,5)
Data2 <- data.frame(label,sev)

d <- Data2 %>%
        group_by(label) %>%
        summarize(sevmean = ifelse(is.na(mean(sev,na.rm=TRUE)),NA,
                                 ifelse(mean(sev,na.rm=TRUE)<=1,1,
                                        ifelse(mean(sev,na.rm=TRUE)>1,
                                               mean(sev[sev>1],na.rm=TRUE),100))))
MIH
  • 1,083
  • 3
  • 14
  • 26
  • probably `case_when()` is what you seek. – RLave Jul 13 '18 at 09:13
  • Never seen this function. After struggling with this for a bit, finally managed to find the problem myself. is.na() instead of ==NA in the first ifelse() statement is the answer – MIH Jul 13 '18 at 09:16
  • Thanks @RiccardoLavelli I will look it up anyways to learn something new and probably very useful! – MIH Jul 13 '18 at 09:16
  • If you find the solution please post it as an answer, it might be helpful for others too. – RLave Jul 13 '18 at 09:17
  • @MIH: To see why is it so, check this link: https://stats.stackexchange.com/questions/5686/what-is-the-difference-between-nan-and-na – rar Jul 13 '18 at 09:19

1 Answers1

3

Your first condition is the issue here. If we remove the nested ifelse and keep only the first one, we get the same output

Data2 %>%
   group_by(label) %>%
   summarise(sevmean = ifelse(mean(sev,na.rm=TRUE)==NaN,NA,1))

#  label sevmean
#  <dbl> <lgl>  
#1  1.00 NA     
#2  2.00 NA     
#3  3.00 NA     
#4  4.00 NA     
#5  5.00 NA     
#6  6.00 NA     
#7  7.00 NA     

I am not sure why you are checking NaN but if you want to do that , check it with is.nan instead of ==

Data2 %>%
  group_by(label) %>%
   summarize(sevmean = ifelse(is.nan(mean(sev,na.rm=TRUE)),NA,
                         ifelse(mean(sev,na.rm=TRUE)<=1,1,
                                ifelse(mean(sev,na.rm=TRUE)>1,
                                       mean(sev[sev>1],na.rm=TRUE),100))))


#  label sevmean
#  <dbl>   <dbl>
#1  1.00    NA   
#2  2.00    1.00
#3  3.00    1.00
#4  4.00    2.00
#5  5.00    3.67
#6  6.00    1.00
#7  7.00    4.50
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213