0

I want to use TermRaider features with GATE. Could someone please post some sample code to load and use this resource in java class. I have tried with following but failed.

 Gate.getCreoleRegister().registerDirectories(new URL("file:///D:/misc_workspace/gate-7.1-build4485-SRC/plugins/TermRaider"));

    ProcessingResource termRaider = (ProcessingResource) Factory.
    createResource("gate.termraider.TermRaiderEnglish",Factory.newFeatureMap());

Exception:
gate.termraider.TermRaiderEnglish cannot be cast to gate.ProcessingResource

Could anyone please suggest how should I proceed.

abhijit nag
  • 451
  • 6
  • 24

2 Answers2

0

The TermRaider system isn't a single PR, it's a whole application (in fact a Groovy ScriptableController). The TermraiderEnglish Resource is just a hook to make that application appear in the "ready-made applications" menu of the GATE Developer GUI.

In embedded code you can load the application using the PersistenceManager

File termRaiderPlugin = new File(Gate.getPluginsHome(), "TermRaider");
File gappFile = new File(new File(termRaiderPlugin, "applications"),
            "termraider-eng.gapp");
CorpusController trApp = (CorpusController)PersistenceManager.loadObjectFromFile(
    gappFile);

When you run the application over a corpus, it creates new instances of three "termbank" LRs containing the information about the newly discovered terms. The vanilla application is really intended for GUI rather than embedded use so it doesn't store references to these new LRs anywhere useful - you'll have to interrogate the CreoleRegister to find them. You might prefer to make your own copy of the application and tweak the control script to store the termbank instances as (say) features on the Corpus, by adding something like

corpus.features.tfidfTermbank = termbank0
corpus.features.annotationTermbank = termbank1
corpus.features.hyponymyTermbank = termbank2

to the end of the control script. You could then access them in your Java code via corpus.getFeatures().get("tfidfTermbank") etc.

Since these Termbank classes are themselves part of the TermRaider plugin, you'll probably want to add gate-termraider.jar to your main application classpath rather than loading it via the GateClassLoader.

Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
0
import gate.Corpus;
import gate.CorpusController;
import gate.Document;
import gate.Factory;
import gate.FeatureMap;
import gate.Gate;
import gate.termraider.bank.AbstractTermbank;
import gate.termraider.output.CsvGenerator;
import gate.util.GateException;
import gate.util.Out;
import gate.util.persistence.PersistenceManager;

import java.io.File;
import java.io.IOException;
import java.net.URL;
import java.net.URLDecoder;


public class termraider {
    public static void main(String[] args) throws IOException, GateException {

        // initialise the GATE library
        Out.prln("Initialising GATE...");
        Gate.init();

        // Initialize GATE
        File gateHome = Gate.getGateHome();
        Out.prln("...GATE initialised");

        //Load TermRaider plugin
        File termRaiderPlugin = new File(Gate.getPluginsHome(), "TermRaider");
        File gappFile = new File(new File(termRaiderPlugin, "applications"),
                   "termraider-eng.gapp");
        CorpusController trApp = (CorpusController)PersistenceManager.loadObjectFromFile(gappFile);
        System.out.println("TermRaider loaded successfully!!!");


    //Loading txt files from a folder path
        Corpus corpus = (Corpus) Factory.createResource("gate.corpora.CorpusImpl");
        //String dirname = "Desktop/Gate_corpus/About Us/New Folder";
        String dirname = "Desktop/GermanHPFCompetition/termRaider";
        File f1 = new File(dirname);
        String s[] = f1.list();
        for (int i=0; i < s.length; i++) {
            String path = dirname + "/" + s[i];
            path = URLDecoder.decode(path, "utf-8");
            path = new File(path).getPath();
            URL u=new URL("file:\\\\\\"+path);

          FeatureMap params = Factory.newFeatureMap();
          params.put("sourceUrl", u);
          params.put("preserveOriginalContent", new Boolean(true));
          params.put("collectRepositioningInfo", new Boolean(true));
          //Out.prln("Creating doc for " + u);
          Document doc = (Document)
          Factory.createResource("gate.corpora.DocumentImpl", params);
          corpus.add(doc);
        } // for each file in the folder

        //running TermRaider plugin with the corpus
        trApp.init();
        trApp.setCorpus(corpus);
        trApp.execute();
        Corpus output_corpus = (Corpus) Factory.createResource("gate.corpora.CorpusImpl");
        output_corpus=trApp.getCorpus();
        System.out.println("TermRaider executed successfully!!!");

        //Creating csv files as output
        AbstractTermbank tb1 = (AbstractTermbank) output_corpus.getFeatures().get("tfidfTermbank");
        AbstractTermbank tb2 = (AbstractTermbank) output_corpus.getFeatures().get("hyponymyTermbank");
        AbstractTermbank tb3 = (AbstractTermbank) output_corpus.getFeatures().get("annotationTermbank");

        System.out.println(tb1);
        System.out.println(tb2);
        System.out.println(tb3);

        CsvGenerator generator = new CsvGenerator();
        File outputFile1 = new File("Desktop/GermanHPFCompetition/termRaider/tfidfTermbank.csv");
        File outputFile2 = new File("Desktop/GermanHPFCompetition/termRaider/hyponymyTermbank.csv");
        File outputFile3 = new File("Desktop/GermanHPFCompeti`enter code here`tion/termRaider/annotationTermbank.csv");
        double threshold1 = 0;
        double threshold2 = 0;
        double threshold3 = 0;
        generator.generateAndSaveCsv(tb1, threshold1, outputFile1);
        generator.generateAndSaveCsv(tb2, threshold2, outputFile2);
        generator.generateAndSaveCsv(tb3, threshold3, outputFile3);
        System.out.println("CSV files created!!!");

    }//end of main

}//end of class
Vanaja Jayaraman
  • 753
  • 3
  • 18