-2

I don't seem to understand gsub or stringr. Example:

 > a<- "a book"

> gsub(" ", ".", a)

[1] "a.book"

Okay. BUT:

> a<-"a.book"

> gsub(".", " ", a)

[1] "      "

I would of expected

"a book"

I'm replacing the full stop with a space.

Also: srintr: str_replace(a, ".", " ") returns:

" .book"

and str_replace_all(a, ".", " ") returns

" "

I can use stringi: stri_replace(a, " ", fixed="."):

"a book"

I'm just wondering why gsub (and str_replace) don't act as I'd have expected. They work when replacing a space with another character, but not the other way around.

Ashley Medway
  • 7,151
  • 7
  • 49
  • 71
Oli
  • 532
  • 1
  • 5
  • 26
  • 2
    Escape the dot as this `gsub("\\.", " ", a)`. Otherwise, it will be treated as regex matching everything. – Gopala May 26 '16 at 14:27
  • 2
    or `gsub('.', ' ', a, fixed = TRUE)` – Sotos May 26 '16 at 14:28
  • 1
    The first words in the documentation for `gsub`'s `pattern` argument are "character string containing a regular expression", where `regular expression` is actually a link to another topic. I suggest you read it. – joran May 26 '16 at 14:28
  • okay thanks, that makes sense. Ive used them a bit as well as quotes like "\n" for new line. I just didn't know "." was a regex. – Oli May 26 '16 at 14:38

1 Answers1

4

That's because the first argument to gsub, namely pattern is actually a regex. In regex the period . is a metacharacter and it matches any single character, see ?base::regex. In your case you need to escape the period in the following way:

gsub("\\.", " ", a)
Iaroslav Domin
  • 2,698
  • 10
  • 19
  • 3
    For information, you can also escape the meaning of `.` by enclosing it in a character class `[.]` it tends to be easier thans the `\\.` to read when you have multiples literal dots in your regex (I.e: in `\\w[.]\\d{2}[.][^.]+` is easier to spot the literal dots than in `\\w\\.\\d{2}\\.[^.]+`) – Tensibai May 26 '16 at 14:43
  • I'll use`[ ]` for everything now to avoid any future confusion. Thank you. – Oli May 26 '16 at 14:50
  • 2
    @OliPaul be carefull, `[abcd]` means any one of a, b, c or d (this is a character class) and not a followed by b, etc. the character class has its use. Regex are great but you have to understand them if you don't want to shoot yourself in the foot. As said in this answer and comments, read the doc :) – Tensibai May 26 '16 at 14:58