11

I want to remove extra spaces, add spaces if required and capitalize first letter of each word after special character using R

string <- "apple,banana, cat, doll and donkey;     fish,goat"

I want output as

Apple, Banana, Cat, Doll and donkey; Fish, Goat

I tried

gsub("(^.|,.|;.)", "\\U\\1", string, perl=T, useBytes = F)

It didn't work. Please help

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Rajan
  • 453
  • 4
  • 22
  • you need to allow for whitespace `gsub("(^.|[,;]\\s*.)", "\\U\\1", string, perl=TRUE)` – rawr Dec 07 '15 at 14:21

1 Answers1

8

You can use

string <- "apple,banana, cat, doll and donkey;     fish,goat"
trimws(gsub("(^|\\p{P})\\s*(.)", "\\1 \\U\\2", string, perl=T))
## => [1] "Apple, Banana, Cat, Doll and donkey; Fish, Goat"

See this IDEONE demo

The PCRE regex matches:

  • (^|\\p{P}) - (Group 1) start of string or any punctuation
  • \\s* - 0 or more whitespace symbols
  • (.) - (Group 2) any character but a newline

The replacement:

  • \\1 - backreferences Group 1
  • - inserts a space between the punctuation and the next character or at the start of string
  • \\U\\2 - turns the Group 2 character uppercase

And trimws removes the initial space we added with the regex.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563