I have a grouped data frame from my big dataset with ~ 800 columns and ~ 2.5 million records. I'm trying to create a row means columns for only 5-10 columns each but, not sure why, I keep getting NA
as means for all rows.
Here's what I tried:
clean_bmk <- clean_bmk %>%
rowwise() %>%
mutate(
BMK_Mean_Strategic = mean(!!strategic, na.rm = T),
BMK_Mean_DiffChange = mean(!!diffchange, na.rm = T),
BMK_Mean_Failure = mean(!!failure, na.rm = T),
BMK_Mean_Narrow = mean(!!narrow, na.rm = T),
BMK_R1_Performance = mean(!!performance_vars, na.rm=T),
BMK_R2_Promotion = mean(!!promote_vars, na.rm=T),
BMK_R3_Derail = mean(!!derail_vars, na.rm=T))
class(clean_bmk)
[1] "grouped_df" "tbl_df" "tbl" "data.frame"
When i do this, all of the columns mutated are NA. But, the following works:
clean_bmk$Strategic_Mean <- rowMeans(clean_bmk[,strategic], na.rm=T)
not sure why, and how can I make a function such that I can only send the list of vars that contains the column names, and mutates the column in the dataframe?
for example:
strategic <- c("column1", "column15", "column27")
and similar with other variables like diffchange
, failure
, etc.
I tried to do dput(clean_bmk)
to share the data with you, but since the dataset is big, I couldn't get it. I'm guessing because it's a grouped_df
, I couldn't use [[
nor sample()
the dataset.