please see MWE below, the custom defined tokenizer is not working, why? tm package version is 0.71
library(tm)
ts <- c("This is a testimonial")
corpDs <- Corpus(VectorSource(ts))
#This is not working
ownTokenizer <- function(x) unlist(strsplit(as.character(x), "i+"))
tdm <- DocumentTermMatrix(corpDs,control=list(tokenize=ownTokenizer))
as.matrix(tdm)
#This is working
ownTokenizer(ts)
Output:
Terms
Docs testimonial this
1 1 1
[1] "Th" "s " "s a test" "mon" "al"
Thank you,
Tobias