3

I have a dataframe with 95 cols and want to batch-rename a lot of them with simple regexes, like the snippet at bottom, there are ~30 such lines. Any other columns which don't match the search regex must be left untouched.

**** Example: names(tr) = c('foo', 'bar', 'xxx_14', 'xxx_2001', 'yyy_76', 'baz', 'zzz_22', ...) ****

I started out with a wall of 25 gsub()s - crude but effective:

names(tr) <- gsub('_1$',    '_R', names(tr))
names(tr) <- gsub('_14$',   '_I', names(tr))
names(tr) <- gsub('_22$',   '_P', names(tr))
names(tr) <- gsub('_50$',   '_O', names(tr))
... yada yada

@Joshua: mapply doesn't work, turns out it's more complicated and impossible to vectorize. names(tr) contains other columns, and when these patterns do occur, you cannot assume all of them occur, let alone in the exact order we defined them. Hence, try 2 is:

pattern <- paste('_', c('1','14','22','50','52','57','76','1018','2001','3301','6005'), '$', sep='')
replace <- paste('_', c('R','I', 'P', 'O', 'C', 'D', 'M', 'L',   'S',   'K',   'G'),         sep='')
do.call(gsub, list(pattern, replace, names(tr)))
Warning messages:
1: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE,  :
  argument 'pattern' has length > 1 and only the first element will be used
2: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE,  :
  argument 'replacement' has length > 1 and only the first element will be used

Can anyone fix this for me?


EDIT: I read all around SO and R doc on this subject for over a day and couldn't find anything... then when I post it I think of searching for '[r] translation table' and I find xlate. Which is not mentioned anywhere in the grep/sub/gsub documentation.

  1. Is there anything in base/gsubfn/data.table etc. to allow me to write one search-and-replacement instruction? (like a dictionary or translation table)

  2. Can you improve my clunky syntax to be call-by-reference to tr? (mustn't create temp copy of entire df)


EDIT2: my best effort after reading around was:

The dictionary approach (xlate) might be a partial answer to, but this is more than a simple translation table since the regex must be terminal (e.g. '_14$').

I could use gsub() or strsplit() to split on '_' then do my xlate translation on the last component, then paste() them back together. Looking for a cleaner 1/2-line idiom.

Or else I just use walls of gsub()s.

Community
  • 1
  • 1
smci
  • 32,567
  • 20
  • 113
  • 146

3 Answers3

4

Wall of gsub could be always replace by for-loop. And you can write it as a function:

renamer <- function(x, pattern, replace) {
    for (i in seq_along(pattern))
            x <- gsub(pattern[i], replace[i], x)
    x
}

names(tr) <- renamer(
     names(tr),
     sprintf('_%s$', c('1','14','22','50','52','57','76','1018','2001','3301','6005')),
     sprintf('_%s' , c('R','I', 'P', 'O', 'C', 'D', 'M', 'L',   'S',   'K',   'G'))
)

And I found sprintf more useful than paste for creation this kind of strings.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Marek
  • 49,472
  • 15
  • 99
  • 121
  • By saying `sprintf` is more useful than `paste` for creating these kind of strings, I presume you meant because we can directly give it vector of integer? – smci May 07 '12 at 06:25
  • @smci I was thinking about string format: you know what is the pattern looking at first argument of `sprintf`. In `paste` with many elements sometimes it's hard to find how results will look like. – Marek May 07 '12 at 21:04
  • @smci But yes - feeding it by mixed types (numeric and character) is another advantage. – Marek May 07 '12 at 21:07
  • Good points, you might like to edit them into your answer. Most important for me is that we can directly parameterize `pattern=vector(int)`. – smci May 08 '12 at 00:10
  • This was very useful, I see that you have to explicitly return x for it to work. How would this be extended to work over a list of data.frames that all need the same replacement patterns? – user1617979 Feb 11 '14 at 18:54
1

The question predates the boom of the tidyverse but this is easily solved with the c(pattern1 = replacement1) option in stringr::str_replace_all.

tr <- data.frame("whatevs_1" = NA, "something_52" = NA)

tr
#>   whatevs_1 something_52
#> 1        NA           NA

patterns <- sprintf('_%s$', c('1','14','22','50','52','57','76','1018','2001','3301','6005'))
replacements <- sprintf('_%s' , c('R','I', 'P', 'O', 'C', 'D', 'M', 'L',   'S',   'K',   'G'))
                        
names(replacements) <- patterns

names(tr) <- stringr::str_replace_all(names(tr), replacements)

tr
#>   whatevs_R something_C
#> 1        NA          NA

And of course, this particular case can benefit from dplyr

dplyr::rename_all(tr, stringr::str_replace_all, replacements)
#>   whatevs_R something_C
#> 1        NA          NA
Fons MA
  • 1,142
  • 1
  • 12
  • 21
0

Using do.call() nearly does it, it objects to differing arg lengths. I think I need to nest do.call() inside apply(), like in apply function to elements over a list.

But I need a partial do.call() over pattern and replace.

This is all starting to make a wall of gsub(..., fixed=TRUE) look like a more efficient idiom, if flabby code.

pattern <- paste('_', c('1','14','22','50'), '$', sep='')
replace <- paste('_', c('R','I', 'P', 'O'),       sep='')
do.call(gsub, list(pattern, replace, names(tr)))
Warning messages:
1: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE,  :
  argument 'pattern' has length > 1 and only the first element will be used
2: In function (pattern, replacement, x, ignore.case = FALSE, perl = FALSE,  :
  argument 'replacement' has length > 1 and only the first element will be used
Community
  • 1
  • 1
smci
  • 32,567
  • 20
  • 113
  • 146