0

I need to write a generic function for "find and replace in R". How can I write a function that takes the following inputs

  • A CSV file (or data frame)
  • A string to find, for example "name@email.com"
  • A string the replace the found string with, for example "medium"

and rewrites the CSV file/data frame so that all the found strings are replaced with the replacement string?

histelheim
  • 4,938
  • 6
  • 33
  • 63
  • what have you tried? is it the exact string or a partial match? e.g. do I replace "the email is name@email.com" with "the email is medium"? Do you have to do this in R? the command line tool `sed` is the best thing I can think of for doing what you're asking. – Justin Oct 10 '12 at 22:27

2 Answers2

8

Here's a quick function to do the job:

library(stringr)

replace_all <- function(df, pattern, replacement) {
  char <- vapply(df, function(x) is.factor(x) || is.character(x), logical(1))
  df[char] <- lapply(df[char], str_replace_all, pattern, replacement)  
  df
}

replace_all(iris, "setosa", "barbosa")

Basically, it identifies all the variables in the data frame that are characters or factors, and then applies str_replace_all to each column. Pattern should be a regular expression, but if you want to match a fixed string, you can do (e.g.)

replace_all(iris, fixed("setosa"), "barbosa")
hadley
  • 102,019
  • 32
  • 183
  • 245
1

The solution below will work for "exact" matches:

dat <- data.frame(a=letters[1:10], y=letters[10:1]) 
apply(dat, 2, function(v, foo, bar) {v[v==foo]=bar;return(v)}, foo='a', bar='baz')

However, this won't replace strings that contain a 1. It will also have many edge cases that won't work the way you might expect.

As I mentioned in my comment, the command line tool sed is ideally suited for this kind of operation.

Justin
  • 42,475
  • 9
  • 93
  • 111