This is a follow up to this question: Concatenate previous and latter words to a word that match a condition in R
I am looking for a regex which splits the string at the second space that happens after comma. Look at the example below:
vector <- c("Paulsen", "Kehr,", "Diego",
"Schalper", "Sepúlveda,", "Alejandro",
"Von Housen", "Kush,", "Terry")
X <- paste(vector, collapse = " ")
X
## this is the string I am looking to split:
"Paulsen Kehr, Diego Schalper Sepúlveda, Diego Von Housen Kush, Terry"
Second space after each comma is the criterion for my regex. So, my output will be:
"Paulsen Kehr, Diego"
"Schalper Sepúlveda, Alejandro"
"Von Housen Kush, Terry"
I came up with a pattern but it is not quite working.
[^ ]+ [^ ]+, [^ ]+( )
Using it with strsplit
removes all the words instead of splitting at group-1 (i.e. [^ ]+ [^ ]+, [^ ]+(group-1)
) only. I think I just needs to exclude the full match and match with the space afterwards only. --
regex demo
strsplit(X, "[^ ]+ [^ ]+, [^ ]+( )")
# [1] "" [2] "" [3] "Von Housen Kush, Terry"
Can anyone think of a regex for finding the second space after each comma?