The textcnt function in R's tau package has a split argument and it's default value is split = "[[:space:][:punct:][:digit:]]+" ç this argumet uses the apostrophe ' to split into words too and I don't want that, how can I modify the argument so it doesn't use the apostrophe to split words?
this code:
`library(tau) text<-"I don't want the function to use the ' to split"
textcnt(text, split = "[[:space:][:punct:][:digit:]]+",method="string",n=1L)`
produces this output:
don function i split t the to use want
1 1 1 1 1 2 2 1 1
instead of having don 1 and t 1, i would like to keep don't as 1 word
I have tried to use str_replace_all from stringr to remove the punctuation beforehand and then omit the punct part of the argument in textcnt but then it doesn't use all kind of symbols such as & > or " to split, I have tried to modify the split argument but then it doesn't split the sentence at all or it keeps the symbols
Thank you