I have a data frame with a large amount of math and science related items, and I want all math related variables removed.
Variable names has no consistent naming for neither math nor science, so it's hard to search and select based variable name. However, the variable labels are descriptive of what the variable represents. I essentially want all variables with labels that contain the word "math" removed. I tried the following code:
library(dplyr)
library(Hmisc)
# Sample data frame:
M <- c(1, 2)
S <- c(3, 4)
old_df <- data.frame(M, S)
label(old_df$M) <- "My Mathematics Variable"
label(old_df$S) <- "My Science Variable"
#dplyr syntax:
new_df <- old_df %>% select( -contains(hmisc::label(.) == "MATH" ) )
using the Hmisc::label()
-function to retrieve a vector with labels.
Sample code of the label()
-function:
> label(old_df)
M S
"My Mathematics Variable" "My Science Variable"
> str(label(old_df))
Named chr [1:2] "My Mathematics Variable" "My Science Variable"
- attr(*, "names")= chr [1:2] "M" "S"
I need a what to search through the label items and find the string "math" within. I tried coerce to a matrix and data frame, but I still can't figure out how to search and retrive the variable names. Any suggestions that will get this to work is welcome.