0

Actually, I am trying to write my own program using java in order to POS tagging a set of text files. I have make a search on the available NLP tools and I found that GATE is one of the most good NLP tools for text processing. I want to download it first then I dont want to use the GUI. I am looking to use it in my own java program.

So

  1. how I can connect between GATE and Netbeans?
  2. How I can use Part of speech recognition in my code?

I am newly in NLP and GATE. I just get start before few hours. but I am PhD student in text mining area and I want to deal with some of NLP tools because I need them in my study. I hope you can help me in finding any tutorial about how to integrate between GATE and java in order to use the libraries and how we can use them.

Thank you for your time and considerations

Q.R
  • 3
  • 1
  • 4

1 Answers1

2

The best tutorial material is the handouts from the regular training course, which are available at http://gate.ac.uk/wiki (look for the latest "training course participants' wiki"). In particular module 5 talks about calling the GATE APIs from Java code.

I dont want to use the GUI. I am looking to use it in my own java program.

Even if you don't want to use the GUI in your production system we always recommend that you get your pipeline set up and tested in the GUI to start with. When you're happy it does what you want then you use "save application state" or "export for GATECloud.net" to save the application and then your code can just use PersistenceManager.loadObjectFromFile to load the fully-configured pipeline without having to first load the correct plugins and then assemble the pipeline components by hand.

Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
  • thank you for your comments and answer.... sorry I have a general question about " pipeline " what it does really mean ??? sorry because I am new in this area so I do not know too much.. thank you and I hope I can get more useful information from you ... – Q.R Oct 14 '13 at 18:19
  • @QusaiRamadan a "pipeline" is the name we use for a sequence of components that run one after the other to process a document. For example, for POS tagging you first need a Tokeniser to split the text into words, a sentence splitter to group tokens into sentences, and then the actual POS tagger to assign tags to the tokens. – Ian Roberts Oct 14 '13 at 18:21
  • Thank you ... I am now in the installation process for GATE and I will read what you have sent to me carefully. I hope that we can keep in touch because as I see you have a good background around this topic. I hope you can keep in touch with me. Thank you for your time and considerations – Q.R Oct 14 '13 at 18:27
  • @QusaiRamadan the best thing to do if you have questions is to subscribe to the mailing list (http://gate.ac.uk/mail) and post them there, it has a much wider audience among the GATE community than you can reach on Stack Overflow. – Ian Roberts Oct 14 '13 at 18:36
  • ok I will do right now.... thank you for your advise and as I said I will be happy if we can keep contact with each other.... – Q.R Oct 14 '13 at 18:47