1

I call my Application from GUI Developer Gate in my programme Java,and i to annotated over my document into format XML.But now i could not extract my data in XML file .Can you help me please ?

  Corpus corpus = Factory.newCorpus("Processing CV");

     CorpusController application = 
     (CorpusController)PersistenceManager.loadObjectFromUrl(new URL("file:////home/NafisehApp.xgapp"));
     application.setCorpus(corpus);

     File[] files = getFilesFromDir("/home/GAte/TestCV.pdf");
        for (int i = 0; i < files.length; i++) {
          if (!files[i].getName().endsWith(".pdf"))
             continue;
            File docFile = files[i];
        gate.Document doc =Factory.newDocument(docFile.toURI().toURL());
        corpus.add(doc);
        application.execute();

    AnnotationSet defaultAnnotSet = doc.getAnnotations();
          Set<String> annotTypesRequired = new HashSet<String>();
          annotTypesRequired.add("Person");
          annotTypesRequired.add("Address");
          annotTypesRequired.add("Title");
     File outputFile = new File("/home/GAte/file.xml");
         DocumentStaxUtils.writeDocument(doc, outputFile);
   FileUtils.write(outputFile,doc.toXml(doc.getAnnotations().get(annotTypesRequired), true));
Vampir
  • 31
  • 5
  • It's not clear what you're trying to do here. You're annotating the document, then writing it out as GATE XML to a file, then immediately overwriting _the same file_ with the inline XML produced by `doc.toXml(annotTypesRequired)`. What exactly is the problem - what output are you getting and how does that differ from what you require? – Ian Roberts Nov 28 '13 at 16:52
  • Also what is `getFilesFromDir`? That looks like you're trying to enumerate a directory but the path you've given it points to a single file, so you'll probably find that `files` is an empty array, i.e. the for loop never runs. – Ian Roberts Nov 28 '13 at 16:54
  • I need to extract Data from my XML file that in another format (PDF or Doc). My Output is a XML file with : 1.Balise html to content Type of document,Name of Title,Creation-Date,And Name of creator(All information about each document that it create by default with Gate GUI developper in Application). 2.Annotation Person,Address,and P (Paragraphe) .

    XXXX
    I should use CSS2XSLFO for extract and convert to another format ?
    – Vampir Nov 28 '13 at 17:17
  • getFilesFromDir is my fonction for read my directory of documents and it works good. – Vampir Nov 28 '13 at 17:17

0 Answers0