2

As the title says. I have a bunch of names and I need to add a comma after the first word that starts with a capital letter.

An example:

txt <- c( "de Van-Smith J", "van der Smith G.H.", "de Smith JW", "Smith JW")

The result should be:

[1] "de Van-Smith, J" "van der Smith, G.H." "de Smith, JW" "Smith, JW"  

I have mainly been trying to use gsub() and stringr::str_replace(), but am stuggling with the regex, any advice would be appreciated.

flee
  • 1,253
  • 3
  • 17
  • 34

3 Answers3

3

You can use -

sub("([A-Z][\\w-]+)", "\\1,", txt, perl = TRUE)

#[1] "de Van-Smith, J"   "van der Smith, G.H." "de Smith, JW"       "Smith, JW"

where ([A-Z][\\w-]+) captures a word which starts with upper case letter and has - or any number of word characters following it.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
3

We can use

sub('\\b([A-Z]\\S+)', "\\1,", txt)
[1] "de Van-Smith, J"     "van der Smith, G.H." "de Smith, JW"        "Smith, JW"          
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Another sub option

> sub("([A-Z].*)(?=\\s)", "\\1,", txt, perl = TRUE)
[1] "de Van-Smith, J"     "van der Smith, G.H." "de Smith, JW"
[4] "Smith, JW"
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81