3

I have a question regarding Gate API using ANNIE plugin within GATE. I used GATE api in java program and it works well for over 50 docs. But when I run it for more than 50 documents it givens following error:

Exception in thread "main" gate.creole.ExecutionException: No sentences or tokens to process in document GATE Document_0003D
Please run a sentence splitter and tokeniser first!
at gate.creole.POSTagger.execute(POSTagger.java:257)
at gate.util.Benchmark.executeWithBenchmarking(Benchmark.java:291)
at gate.creole.ConditionalSerialController.runComponent(ConditionalSerialController.java:163)
at gate.creole.SerialController.executeImpl(SerialController.java:157)
at gate.creole.ConditionalSerialAnalyserController.executeImpl(ConditionalSerialAnalyserController.java:244)
at gate.creole.ConditionalSerialAnalyserController.execute(ConditionalSerialAnalyserController.java:139)

I tried to load each component separately but it still it gives the same error. I also tried to cleanup the corpus after every 10 documents during processing but still error is there

The code is:

public class MyGate  {
private CorpusController annieController;
/**
* Initialise the ANNIE system. This creates a "corpus pipeline"
* application that can be used to run sets of documents through
* the extraction system.
*/
public void initAnnie() throws GateException, IOException {
Out.prln("Initialising ANNIE...");

// load the ANNIE application from the saved state in plugins/ANNIE
File pluginsHome = Gate.getPluginsHome();
File anniePlugin = new File(pluginsHome, "ANNIE");
File annieGapp = new File(anniePlugin, "ANNIE_with_defaults.gapp");
annieController =
  (CorpusController) PersistenceManager.loadObjectFromFile(annieGapp);
Out.prln("...ANNIE loaded");
} // initAnnie()
public void cleanUp(){
Corpus corp= annieController.getCorpus();
if(!corp.isEmpty()){
for(int i=0;i<corp.size();i++){
Document doc1 = (Document)corp.remove(i);
corp.unloadDocument(doc1);
Factory.deleteResource(corp);
Factory.deleteResource(doc1);
}
}
}
/** Tell ANNIE's controller about the corpus you want to run on */
public void setCorpus(Corpus corpus) {
annieController.setCorpus(corpus);
} // setCorpus

/** Run ANNIE */
public void execute() throws GateException {

Out.prln("Running ANNIE...");

annieController.execute();
Out.prln("...ANNIE complete");
} // execute()


//////-------------------------------MAIN--------------------------------------///////
public static void main(String args[]) throws GateException, IOException {
ArrayList<CreateHashMap> train_data_list = new ArrayList<CreateHashMap>();

String workingDir = System.getProperty("user.dir");
System.out.println("Current working directory : " + workingDir);
String trainpath=workingDir+"/input/test.json/test.json";
/*********************************************/
try {
        // read the json file
        FileReader reader = new FileReader(trainpath);

        JSONParser jsonParser = new JSONParser();


        JSONArray a = (JSONArray) jsonParser.parse(new FileReader(trainpath));
                   int g=0; 
                   for (Object o : a)
                    {
                        if(g<=100){
                        CreateHashMap new_hash_item =new CreateHashMap();
                        JSONObject person = (JSONObject) o;

                        String rid = (String) person.get("request_id");
                        System.out.println(rid);

                        double date=(Double) person.get("times_request");
                        java.util.Date time=new java.util.Date((long)date*1000);

                        int day=time.getDate();

                        new_hash_item.createList(rid,day);
                        train_data_list.add(new_hash_item);

                    }
                    g++;}

    } catch (FileNotFoundException ex) {
        ex.printStackTrace();
    } catch (IOException ex) {
        ex.printStackTrace();
    } catch (ParseException ex) {
        ex.printStackTrace();
    } catch (NullPointerException ex) {
        ex.printStackTrace();
    }

  /****************************************/



// initialise the GATE library
Out.prln("Initialising GATE...");
Gate.setGateHome(new File("C:/Program Files/GATE_Developer_8.0"));
Gate.init();
Out.prln("...GATE initialised");

// initialise ANNIE (this may take several minutes)
StandAloneAnnie annie = new StandAloneAnnie();
annie.initAnnie();

// create a GATE corpus and add a document for each command-line
// argument

Corpus corpus = Factory.newCorpus("StandAloneAnnie corpus");
String pathdoc=workingDir+"/input/test.json/";
SentenceSplitter sp= new SentenceSplitter();
int countdoc=0;
for(int i = 0; i < train_data_list.size()/*args.length*/; i++) {
   Out.prln("here we go.............");  
  FeatureMap params = Factory.newFeatureMap();
  String text=train_data_list.get(i).get_Request_text();
params.put(gate.Document.DOCUMENT_STRING_CONTENT_PARAMETER_NAME, text);
Document doc=(gate.Document)Factory.createResource("gate.corpora.DocumentImpl",params);

params.put("preserveOriginalContent", new Boolean(true));
params.put("collectRepositioningInfo", new Boolean(true));
  corpus.add(doc);
  countdoc++;

  annie.setCorpus(corpus);
  annie.execute();
  if(countdoc==10)
  {
      corpus.cleanup();
      System.out.println("...............cleanup....................");
  }


} // for each of args


} // main


} // class MyGate

I am getting error at line:

annie.execute();

Kindly help me. I can not figure out the problem in it.

Aroleena
  • 31
  • 3

3 Answers3

1

Usually it means that "String text" has no any tokens at all. There could be only special characters or spaces. Print out processing document(or file name) and verify that it really has some sensible content.

ashingel
  • 494
  • 3
  • 11
  • Thanks ok :-). I used the condition on document text that if it is not empty then add document to corpus else not. And it resolved issue for 500 documents but if I use more than 500 documents again same error is thrown. Note: I have corpus with 3000 docs can i go for some other method for such large collection. – Aroleena Aug 21 '14 at 18:35
0

By default the POS tagger (and a number of other similar PRs) will fail with an exception if it can't find the input annotations it requires. This happens most frequently on completely empty documents. The POS tagger PR has a parameter failOnMissingInputAnnotations which controls this behaviour - it defaults to true but you can set it to false to cause the PR to do nothing on such documents rather than failing.

Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
  • As I am new to gate and ANNIE Can you tell what changes can I make and which function to call and where in the code ? – Aroleena Aug 17 '14 at 19:11
  • @Aroleena the simplest approach would be to load the ANNIE application into GATE Developer, change the POS tagger parameter value (open the application, find the pos tagger in the right hand list, then set the parameter in the table below), then re-save the application to another file. In your code you then pass that file to the PersistenceManager in place of ANNIE_with_defaults. – Ian Roberts Aug 17 '14 at 20:16
  • I can not find failOnMissingInputAnnotations parameter in Annie POS tagger. I am uing Gate developer 8.0. Annie has only 3 parameters:encoding, lexiconURL,rulesURL. – Aroleena Aug 21 '14 at 20:31
  • @Aroleena read my comment - it's a runtime parameter which you set in the application editor, not an init parameter you would set at creation time. – Ian Roberts Aug 21 '14 at 22:40
0

I think there is a problem with your gapp file. So, You need to take care of that.

English TOkenizer Sentence Spliter POS tagger this shoul be the sequence

Vojtech Ruzicka
  • 16,384
  • 15
  • 63
  • 66