I have a vector of strings, some of which include punctuations/symbols. For example:
words <- ("hi", "my.", "name!", "is98", ""joe"")
My goal is to create a vector that has all these words, but the punctuations, numbers, and symbols are made into their own string in the vector. So in this case
("hi", "my", ".", "name", "!", "is", "98", """, "joe", """)
My initial plan was to use grep
to identify the indices where said punctuations exist, then loop through them and use strsplit
to divide them based on said punctuations, as follows:
puncIndex <- grep('[\\"!?.^]', words)
for(i in puncIndex){
strsplit(words[i], '[\\"!?.^]')
}
But I'm having a couple of problems. One being that I realize that the result of strsplit
is going to be a list itself, and I can't figure out how to cleanly just move each of the components back to the original vector. The other being that even when I try strsplit
on just one word, it only returns the first part. For example:
strsplit(words[2], ".")
[[1]]
[1] "my"
EDIT: added numbers as a class to be separated as well