0

I am trying to use the Stanford Topic Modeling Toolbox (TMT) to try out Topic Modeling [0]. I am a Scala beginner. However, I can't seem to prepare my data set by reading a CSV file. Here's my code

import scalanlp.io._;

val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1);

println(source.data.size);

This throws the following error

Stanford TMT\example-0-test.scala:6: error: not found: value IDColumn
val source = CSVFile("pubmed-oa-subset.csv") ~> IDColumn(1);

Similarly, I also get an error on other data pre-processing functions like Tokenizer. Here's the code

// Stanford TMT Example 0 - Basic data loading
// http://nlp.stanford.edu/software/tmt/0.4/


import scalanlp.io._;
val source = CSVFile("pubmed-oa-subset.csv") ;
println(source.data.size);

val tokenizer = {
  SimpleEnglishTokenizer()
 }

Here's the error received for the above code.

error: not found: value SimpleEnglishTokenizer
  SimpleEnglishTokenizer()

I am using the same CSV file as given on the TMT homepage [1]. Also, the script and the data are in the same folder.

What is the issue? I am unable to run the exact same test examples from the TMT homepage.

[0] http://nlp.stanford.edu/software/tmt/tmt-0.4/

[1] http://nlp.stanford.edu/software/tmt/tmt-0.4/examples/pubmed-oa-subset.csv

Dexter
  • 11,311
  • 11
  • 45
  • 61

1 Answers1

0

I encounter a problem when I run through the demo, but it differ from yours.My problem is caused by messy chars in the CSV file(http://nlp.stanford.edu/software/tmt/tmt-0.4/examples/pubmed-oa-subset.csv ).And I open the CSV file in editer as UTF-8, and replace the unreadable chars(they are represented as a same symbol),and it runs OK.

Your problem seems like lack of class file or CSV file format error.I'm not sure.But you can try my solution, I think messy chars is a common problem in the csv file download from the home page.Or you can check the integrity of the executable jar file or the csv file.

If the solutions above doesn't work.You can ask in the stanford java nlp mailing list. https://mailman.stanford.edu/mailman/listinfo/java-nlp-user

Huskar
  • 61
  • 1
  • 7