Solution
I went with the solution provided by @thelatemail because I'm trying to stick with tidyverse and thus dplyr--I'm still new to R, so I'm taking baby steps and taking advantage of helper libraries. Thank you everyone for taking the time to contribute solutions.
df_new <- df_inh %>%
select(
isolate,
Phenotype,
which(
sapply( ., function( x ) sd( x ) != 0 )
)
)
Question
I'm trying to select columns if the column name is "isolate" or "Phenotype" or if the standard deviation of the column values is not 0.
I have tried the following code.
df_new <- df_inh %>%
# remove isolate and Phenotype column for now, don't want to calculate their standard deviation
select(
-isolate,
-Phenotype
) %>%
# remove columns with all 1's or all 0's by calculating column standard deviation
select_if(
function( col ) return( sd( col ) != 0 )
) %>%
# add back the isolate and Phenotype columns
select(
isolate,
Phenotype
)
I also tried this
df_new <- df_inh %>%
select_if(
function( col ) {
if ( col == 'isolate' | col == 'Phenotype' ) {
return( TRUE )
}
else {
return( sd( col ) != 0 )
}
}
)
I can select columns by standard deviation or by column name however I cannot do this simultaneously.