3

Ive been following the tutorial on how to use mallet in R to create topic models. My text file has 1 sentence per line. It looks like this and has about 50 sentences.

Thank you again and have a good day :).
This is an apple.
This is awesome!
LOL!
i need 2.
.
.
. 

This is my code:

Sys.setenv(NOAWT=TRUE)

#setup the workspace
# Set working directory
dir<-"/Users/jxn"
Dir <- "~/Desktop/Chat/malletR/text" # adjust to suit
require(mallet)
documents1 <- mallet.read.dir(Dir)
View(documents1)
stoplist1<-mallet.read.dir("~/Desktop/Chat/malletR/stoplists")
View(stoplist1)
**mallet.instances <- mallet.import(documents1$id, documents1$text, "~/Desktop/Chat/malletR/stoplists/en.txt", token.regexp ="\\p{L}[\\p{L}\\p{P}]+\\p{L}")**

Everything works except for the last line of the code

**`**mallet.instances <- mallet.import(documents1$id, documents1$text, "~/Desktop/Chat/malletR/stoplists/en.txt", token.regexp ="\\p{L}[\\p{L}\\p{P}]+\\p{L}")**`**

I keep getting this error :

Error in .jcall("RJavaTools", "Ljava/lang/Object;", "invokeMethod", cl,  : 
  java.lang.NoSuchMethodException: No suitable method for the given parameters

According to the package, this is how the function should be:

mallet.instances <- mallet.import(documents$id, documents$text, "en.txt",
                    token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}")

I believe it has something to do with the token.regexp argument as
documents1 <- mallet.read.dir(Dir) works just fine which means that the first 3 arguments supplied to mallet.instances was correct.

This is a link to the git repo that i was following the tutorial from. https://github.com/shawngraham/R/blob/master/topicmodel.R

Any help would be much appreciated.

Thanks, J

jxn
  • 7,685
  • 28
  • 90
  • 172

2 Answers2

7

I suspect the problem is with your text file. I have encountered the same error and resolved it by using the as.character() function as follows:

mallet.instances <- mallet.import(as.character(documents$id), as.character(documents$text), "en.txt", FALSE, token.regexp="\\p{L}[\\p{L}\\p{P}]+\\p{L}")

histelheim
  • 4,938
  • 6
  • 33
  • 63
litlogger
  • 227
  • 5
  • 9
  • Having the same problem but didn't work the solution.http://stackoverflow.com/questions/5228750/rjava-jcall-issue wonder if you have any thoughts. – add-semi-colons Oct 30 '14 at 22:51
0

Are you sure you converted the id field also to character ? It is easy to overlook the advice and leave it as an integer.

Also there is a typo in the code sample: the backslashes have to be escaped:
    token.regexp = "\\p{L}[\\p{L}\\p{P}]+\\p{L}"
This usually occurs because the html text editor eats up one backslash.
Raja
  • 994
  • 13
  • 15