3

I have a dataset that contains spaces and other punctuation characters. I'm trying to replace the spaces and special characters with "_". This creates spots with multiple "_" strung together, so I'd like to remove these too by using the following function as described here :

removeSpace <- function(x){
    class1 <- class(x)
    x <- as.character(x)
    x <- gsub(" |&|-|/|'|(|)",'_', x) # convert special characters to _
    x <- gsub("([_])\\1+","\\1", x)   # convert multiple _ to single _

    if(class1 == 'character'){
        return(x)
    }
    if(class1 == 'factor'){
        return(as.factor(x))
    }
}

The issue is instead of removing spaces and replacing with "_" it does every other character with "_" (i.e. "test" -> "t_e_s_t")

What am I doing wrong?

screechOwl
  • 27,310
  • 61
  • 158
  • 267

1 Answers1

10

You don't need to run two separate replacements to accomplish this. Just put a + quantifier in your match pattern.

Match: [-/&'() ]+

Replace with: _

Also note that I used a character set instead of switching between each option with |. This is generally a better approach when matching one of multiple individual characters.

CAustin
  • 4,525
  • 13
  • 25
  • +1. But I would have also added an `_`(underscore) in that character set. So, that the 2nd `gsub` is not required at all. What do you think? – Gurmanjot Singh Oct 02 '17 at 18:55
  • @Gurman I suppose that wouldn't hurt, but the OP only indicated that they needed the second replacement due to an undesired outcome of their first replacement. As I understood the question, multiple `_`s were not a problem in the original string. @screechOwl Do you think this would be useful? – CAustin Oct 02 '17 at 19:11
  • I ended up using this : `x <- gsub("[[:punct:]]+",'_', x)`. Thanks for the `+` idea. – screechOwl Oct 02 '17 at 20:14