-1

I am trying to write my own program using Java in order to segment set of text files into sentences. I have make a search on the available NLP tools and I found that GATE but i couldn't use it to just segment using the pipeline.

  1. Any ideas how to limit the functionality of the pipeline
  2. Any piece of codes that can help me to write my program
rtruszk
  • 3,902
  • 13
  • 36
  • 53
Dreamer
  • 17
  • 7
  • What investigation have you done so far? Have you looked into the documentation? Sentence segmentation is a pretty basic task for GATE. – dedek May 12 '15 at 12:20
  • i used StandAloneANNIE and tried some codes but it is not working – Dreamer May 12 '15 at 12:33
  • You can also try to adapt [BatchProcessApp](http://gate.ac.uk/wiki/code-repository/src/sheffield/examples/BatchProcessApp.java) and use your own text segmenting pipeline... But there is no help to such general questions like _"it is not working"_ ... – dedek May 12 '15 at 12:44

1 Answers1

2

Adapted from a different answer:

import gate.*;
import gate.creole.SerialAnalyserController;
import java.io.File;
import java.util.*;

public class Segmenter {
    public static void main(String[] args) throws Exception {
        Gate.setGateHome(new File("C:\\Program Files\\GATE_Developer_8.0"));
        Gate.init();
        regiterGatePlugin("ANNIE");

        SerialAnalyserController pipeline = (SerialAnalyserController) Factory.createResource("gate.creole.SerialAnalyserController");
        pipeline.add((ProcessingResource) Factory.createResource("gate.creole.tokeniser.DefaultTokeniser"));
        pipeline.add((ProcessingResource) Factory.createResource("gate.creole.splitter.SentenceSplitter"));

        Corpus corpus = Factory.newCorpus("SegmenterCorpus");
        Document document = Factory.newDocument("Text to be segmented.");
        corpus.add(document); 
        pipeline.setCorpus(corpus); 
        pipeline.execute();

        AnnotationSet defaultAS = document.getAnnotations();
        AnnotationSet sentences = defaultAS.get("Sentence");

        for (Annotation sentence : sentences) {
            System.err.println(Utils.stringFor(document, sentence));
        }

        //Clean up
        Factory.deleteResource(document);
        Factory.deleteResource(corpus);
        for (ProcessingResource pr : pipeline.getPRs()) {
            Factory.deleteResource(pr);
        }
        Factory.deleteResource(pipeline);
    }

    public static void regiterGatePlugin(String name) throws Exception {
        Gate.getCreoleRegister().registerDirectories(new File(Gate.getPluginsHome(), name).toURI().toURL());
    }
}
Community
  • 1
  • 1
dedek
  • 7,981
  • 3
  • 38
  • 68
  • @Dreamer Did it help? ...you can up-vote the answer or mark it as correct. I newer know what is working for you... ;-) – dedek May 14 '15 at 11:20