Situation 1
I get strange results when applying the phrasetotoken function in the Quanteda packages:
dict <- dictionary(list(words = ......*lokale energie productie*......))
txt <- c("I like lokale energie producties)
phrasetotoken(txt, dict)
Problem: Sometimes I get lokale_energie_producties
back, sometimes incorrectly the original lokale energie producties
.
The problem seems connected to the dots in the dictionary. These dots are(?) needed to deal with starting and trailing characters (e.g., "1lokale energie productieniveau").
Situation 2
When loading in a txt file, the the prasetotoken function does not work at all.
txt <- paste(readLines("foo.txt", collapse=" ")
txt <- phrasetotoken(txt, dict)
NB. Using the function readtext
instead of readLines
throws the following error
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘phrasetotoken’ for signature ‘"readtext", "dictionary"’
Any help is appreciated.