0

I have a data frame with a large amount of math and science related items, and I want all math related variables removed.

Variable names has no consistent naming for neither math nor science, so it's hard to search and select based variable name. However, the variable labels are descriptive of what the variable represents. I essentially want all variables with labels that contain the word "math" removed. I tried the following code:

library(dplyr)
library(Hmisc) 

# Sample data frame:
M <- c(1, 2)
S <- c(3, 4)
old_df <- data.frame(M, S)
label(old_df$M) <- "My Mathematics Variable"
label(old_df$S) <- "My Science Variable"

#dplyr syntax:
new_df <- old_df %>% select( -contains(hmisc::label(.) == "MATH" ) )

using the Hmisc::label()-function to retrieve a vector with labels.

Sample code of the label()-function:

> label(old_df)
                        M                         S 
"My Mathematics Variable"     "My Science Variable" 
> str(label(old_df))
 Named chr [1:2] "My Mathematics Variable" "My Science Variable"
 - attr(*, "names")= chr [1:2] "M" "S"

I need a what to search through the label items and find the string "math" within. I tried coerce to a matrix and data frame, but I still can't figure out how to search and retrive the variable names. Any suggestions that will get this to work is welcome.

Pål Bjartan
  • 793
  • 1
  • 6
  • 18

1 Answers1

1

You mean something like this? (UPDATED to more closely map grepl to your example.)

library(Hmisc)
library(dplyr)

Hmisc::label(mtcars$mpg) <- "Miles per Gallon" # grepl WILL catch this
Hmisc::label(mtcars$hp) <- "Not here" # nope
Hmisc::label(mtcars$qsec) <- "MILES all caps here" # nope unless you ignore_case = TRUE
Hmisc::label(mtcars$drat) <- "later in the label Miles is here" # yepp



mtcars %>% select_if(.predicate = !(grepl("Miles", Hmisc::label(.), ignore.case = TRUE)))
#>                     cyl  disp  hp    wt  qsec vs am gear carb
#> Mazda RX4             6 160.0 110 2.620 16.46  0  1    4    4
#> Mazda RX4 Wag         6 160.0 110 2.875 17.02  0  1    4    4
#> Datsun 710            4 108.0  93 2.320 18.61  1  1    4    1
#> Hornet 4 Drive        6 258.0 110 3.215 19.44  1  0    3    1
#> Hornet Sportabout     8 360.0 175 3.440 17.02  0  0    3    2
#> Valiant               6 225.0 105 3.460 20.22  1  0    3    1
#> Duster 360            8 360.0 245 3.570 15.84  0  0    3    4
#> Merc 240D             4 146.7  62 3.190 20.00  1  0    4    2
#> Merc 230              4 140.8  95 3.150 22.90  1  0    4    2
#> Merc 280              6 167.6 123 3.440 18.30  1  0    4    4
#> Merc 280C             6 167.6 123 3.440 18.90  1  0    4    4
#> Merc 450SE            8 275.8 180 4.070 17.40  0  0    3    3
#> Merc 450SL            8 275.8 180 3.730 17.60  0  0    3    3
#> Merc 450SLC           8 275.8 180 3.780 18.00  0  0    3    3
#> Cadillac Fleetwood    8 472.0 205 5.250 17.98  0  0    3    4
#> Lincoln Continental   8 460.0 215 5.424 17.82  0  0    3    4
#> Chrysler Imperial     8 440.0 230 5.345 17.42  0  0    3    4
#> Fiat 128              4  78.7  66 2.200 19.47  1  1    4    1
#> Honda Civic           4  75.7  52 1.615 18.52  1  1    4    2
#> Toyota Corolla        4  71.1  65 1.835 19.90  1  1    4    1
#> Toyota Corona         4 120.1  97 2.465 20.01  1  0    3    1
#> Dodge Challenger      8 318.0 150 3.520 16.87  0  0    3    2
#> AMC Javelin           8 304.0 150 3.435 17.30  0  0    3    2
#> Camaro Z28            8 350.0 245 3.840 15.41  0  0    3    4
#> Pontiac Firebird      8 400.0 175 3.845 17.05  0  0    3    2
#> Fiat X1-9             4  79.0  66 1.935 18.90  1  1    4    1
#> Porsche 914-2         4 120.3  91 2.140 16.70  0  1    5    2
#> Lotus Europa          4  95.1 113 1.513 16.90  1  1    5    2
#> Ford Pantera L        8 351.0 264 3.170 14.50  0  1    5    4
#> Ferrari Dino          6 145.0 175 2.770 15.50  0  1    5    6
#> Maserati Bora         8 301.0 335 3.570 14.60  0  1    5    8
#> Volvo 142E            4 121.0 109 2.780 18.60  1  1    4    2
Pål Bjartan
  • 793
  • 1
  • 6
  • 18
Chuck P
  • 3,862
  • 3
  • 9
  • 20
  • What does `.predicate` do? Would it select any string containing these words? – Pål Bjartan May 14 '20 at 20:51
  • 1
    From the helpfile for `select_if` "The variables for which .predicate is or returns TRUE are selected." It's a test that must return true or false so in your case if I understand it (data please or at least a sample ) then you probably want `.predicate = !(grepl("MATH", Hmisc::label(.)))` in the example I gave you mtcars %>% select_if(.predicate = !(grepl("Miles",Hmisc::label(.)))) removes just mpg based on the label – Chuck P May 14 '20 at 21:10
  • This seem to be one step closer. It removed all items containing the full word "MATH". However, variables containing "**math**ematics" are still present. Is there a way to also filter out "math" within other words? – Pål Bjartan May 15 '20 at 00:12
  • I'm very confused you sure you don't have a typo somewhere? On my system given what little example you've shared... `suspects <- c("WEIGHT FOR MATHEMATICS TEACHER DATA", "WEIGHT FOR SCIENCE TEACHER DATA", "WEIGHT FOR MAT+SCI TEACHER DATA COMBIN")` if I run `!(grepl("MAT", suspects))` I get back `FALSE TRUE FALSE` which is what you say you want – Chuck P May 15 '20 at 12:18
  • I must have screwed up the syntax somehow. I tried a new test sample, and it worked when setting `ignore.case = TRUE` in `grepl()` – Pål Bjartan May 15 '20 at 12:56
  • Okay glad to help and yes case might matter but you should be able to extend `grepl` as needed. – Chuck P May 15 '20 at 13:10