6

How can i tokenize a string in java class using stanford parser?

I am only able to find examples of documentProcessor and PTBTokenizer taking text from external file.

 DocumentPreprocessor dp = new DocumentPreprocessor("hello.txt");
   for (List sentence : dp) {
    System.out.println(sentence);
  }
  // option #2: By token

   PTBTokenizer ptbt = new PTBTokenizer(new FileReader("hello.txt"),
          new CoreLabelTokenFactory(), "");
  for (CoreLabel label; ptbt.hasNext(); ) {
    label = (CoreLabel) ptbt.next();
    System.out.println(label);
  }

Thanks.

Naveen
  • 773
  • 3
  • 17
  • 40

1 Answers1

6

PTBTokenizer constructor takes a java.io.Reader, then you can use a StringReader to parse your text

CapelliC
  • 59,646
  • 5
  • 47
  • 90
  • Can you write the code for constructor and how can i use reader with this. Thanks – Naveen Oct 11 '12 at 20:22
  • 4
    never mind, this is giving me tokens : List rawWords = tokenizerFactory.getTokenizer(new StringReader(sentence)).tokenize(); System.out.println(rawWords.get(0).value()); – Naveen Oct 11 '12 at 20:46
  • 1
    I took some time to open NetBeans, crafting a new Project, etc... then blackout... damn... – CapelliC Oct 11 '12 at 21:01
  • @Naveen Thanks for sharing your solution! However, won't that create a new PTBTokenizer object each time you pass in a different sentence? If you have multiple sentences, I guess the pre-step to your solution is to concat them into a single String "sentences" and then use your solution on "sentences"? – Nishant Kelkar Dec 27 '14 at 01:33