1

I have a grouped data frame from my big dataset with ~ 800 columns and ~ 2.5 million records. I'm trying to create a row means columns for only 5-10 columns each but, not sure why, I keep getting NA as means for all rows.

Here's what I tried:

clean_bmk <- clean_bmk %>% 
                rowwise() %>%
                mutate(
                       BMK_Mean_Strategic = mean(!!strategic, na.rm = T),
                       BMK_Mean_DiffChange = mean(!!diffchange, na.rm = T),
                       BMK_Mean_Failure = mean(!!failure, na.rm = T),
                       BMK_Mean_Narrow = mean(!!narrow, na.rm = T),
                       BMK_R1_Performance = mean(!!performance_vars, na.rm=T),
                       BMK_R2_Promotion = mean(!!promote_vars, na.rm=T),
                       BMK_R3_Derail = mean(!!derail_vars, na.rm=T))


class(clean_bmk)
[1] "grouped_df" "tbl_df"     "tbl"        "data.frame"

When i do this, all of the columns mutated are NA. But, the following works:

clean_bmk$Strategic_Mean <- rowMeans(clean_bmk[,strategic], na.rm=T)

not sure why, and how can I make a function such that I can only send the list of vars that contains the column names, and mutates the column in the dataframe?

for example:

strategic <- c("column1", "column15", "column27")

and similar with other variables like diffchange, failure, etc.

I tried to do dput(clean_bmk) to share the data with you, but since the dataset is big, I couldn't get it. I'm guessing because it's a grouped_df, I couldn't use [[ nor sample() the dataset.

user1828605
  • 1,723
  • 1
  • 24
  • 63

1 Answers1

1

It would be inefficent to use rowwise, instead better option is rowMeans after selecting the columns of interest

library(dplyr)
clean_bmk %>% 
    ungroup %>%
    mutate(
      BMK_Mean_Strategic = rowMeans(select(., strategic),  na.rm = TRUE),
       BMK_Mean_DiffChange = rowMeans(select(., diffchange), na.rm = TRUE),
       BMK_Mean_Failure = rowMeans(select(., failure), na.rm = TRUE),
       BMK_Mean_Narrow = rowMeans(select(., narrow), na.rm = TRUE),
       BMK_R1_Performance = rowMeans(select(., performance_vars), na.rm=TRUE),
       BMK_R2_Promotion = rowMeans(select(., promote_vars), na.rm=TRUE),
       BMK_R3_Derail = rowMeans(select(., derail_vars), na.rm=TRUE))

Using a reproducible example

data(mtcars)
#v1 <- c('mpg', 'disp')
mtcars %>%
   transmute(newMean = rowMeans(select(., v1), na.rm = TRUE)) %>%
   head  
#                  newMean
#Mazda RX4           90.50
#Mazda RX4 Wag       90.50
#Datsun 710          65.40
#Hornet 4 Drive     139.70
#Hornet Sportabout  189.35
#Valiant            121.55
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I tried that. I get this error: `Error in rowMeans(select(., strategic), na.rm = T) : 'x' must be numeric` – user1828605 Feb 28 '20 at 18:17
  • @user1828605 Can you please check if the columns are numeric. I am assuming that the columns are numeric. E.g. this works with a reproducible example `v1 <- c('mpg', 'disp'); mtcars %>% transmute(newMean = rowMeans(select(., v1), na.rm = TRUE)) %>% head %>% pull(newMean)#[1] 90.50 90.50 65.40 139.70 189.35 121.55` – akrun Feb 28 '20 at 18:19
  • They are. This is what's confusing me a lot, and that's why I also included the `class` of the `df` in my question. I'm not sure, but I have a feeling that it also adds the grouped ID which is not numeric. So, whenever I try to run the rowMeans like you showed above, is it also taking the id? and trying to take mean? if that's the case, I don't know how to fix it. – user1828605 Feb 28 '20 at 18:21
  • 1
    @user1828605 iff you already have a grouped_df attribute, then do `clean_bmk %>% ungroup %>%..` and apply the rowMeans because for rowMeans, you don't need any group attribute – akrun Feb 28 '20 at 18:22
  • 1
    @akun. That's it!!!! That was the cause and ungroup() worked! That's awesome how quick you came up with that!! Thanks a bunch! – user1828605 Feb 28 '20 at 18:24