1

I have a very large data set with results and dates. A small subset of the data (I have many more columns with different names and rows):

 result_1    date_1 result_2    date_2 result_3    date_3 result_4    date_4
1        1 12.8.2020        4 13.8.2020        2 15.8.2020        1 20.8.2020
2        3 15.8.2020        3 14.8.2020        5 17.8.2020        2 21.8.2020

I want to change some of the columns into numeric, depending on the column names. I thought of maybe possibly calling vectors with regex, as follows:

data$"result.*" <- as.numeric(data$"result\.*")

but it produces an error:

Error in `$<-.data.frame`(`*tmp*`, "result.*", value = numeric(0)) : 
  replacement has 0 rows, data has 2

I can also use mutate or some sort of a loop, but I'm sure there's a more efficient way to do this especially since the data set is huge.

Amidavid
  • 177
  • 7
  • 4
    Something like `data[grepl('result', names(data))] <- lapply(data[grepl('result', names(data))], as.numeric)` – Sotos Sep 01 '20 at 12:10

2 Answers2

3
dat <- dplyr::tibble(result_1=c(1,2),
                     date_1=c(2,3),
                     result_2=c(3,4),
                     date_2=c(34,3)) 

dat %>% 
  dplyr::mutate_if(is.numeric,as.character) %>%
  dplyr::mutate_at(dplyr::vars(dplyr::matches("result")),as.numeric)
sambold
  • 807
  • 5
  • 15
0

The other answer works, but note that mutate_at and mutate_if are being superceded by the across function in dplyr:

dat <- data.frame(result_1 = c("4", "2"), date_1 = letters[1:2], result_2 = c("2", "3"))

tidyverse

library(dplyr)

dat %>% mutate(across(matches("result_.*"), as.numeric))
#>   result_1 date_1 result_2
#> 1        4      a        2
#> 2        2      b        3

data.table

library(data.table)

dat <- data.table(dat)
cols <- grep("result_.*", names(dat), value=TRUE)
dat[, (cols) := lapply(.SD, as.numeric), .SDcols=cols]
dat
#>    result_1 date_1 result_2
#> 1:        4      a        2
#> 2:        2      b        3
Vincent
  • 15,809
  • 7
  • 37
  • 39
  • 2
    Do you notice that OP has difficulties selecting the columns using regex? Why would you use `a|c` instead of something OP could lean on eg `results` etc – Onyambu Sep 01 '20 at 12:43
  • I disagree with your assessment of the problem. The problem in the OP was not the regex, but the selection mechanism used to leverage the regex. The OP seems to know how to build regexes just fine. That said, I agree that my answer would have been better if it stuck closer to the original, and I have edited accordingly. – Vincent Sep 01 '20 at 19:47
  • If OP had ideas on how to select, then they would not have done `data$"..."`. – Onyambu Sep 01 '20 at 20:20
  • exactly why i illustrated the use of `grep` – Vincent Sep 02 '20 at 01:12