0

I got a Java API for words stemming but I am unable to run it. I am working on an NLP project in PYTHON 3.x where I read all the text from documents and converted it into words. I want to use this Java API for stemming to stem my words and then process further. I was exploring about running Java API directly into Python program using different libraries and I read a little bit about PY4J but unable to run it. Can anyone please guide me how to use this API in Python or if this is not possible then how to use it in ECLIPSE.

Stemmer API Instructions:

Description: Word Stemmer API is a Java application that provides an interface to extract the stems, prefixes, and postfixes of words.

Setup: Copy the Data folder into your project directory and add the provided JAR file to your project.

Usage:

    1. loadRules()
        - Purpose:      This function loads the stemming rules from the ./Data/Rules.txt into the program.
        - Syntax:       void loadRules();
        - Parameters:   None
        - Return type:  Void


    2. stemWord()
        - Purpose:      This function accepts as input a single word and returns a HashMap containing its stem, prefix, and postfix.
        - Syntax:       HashMap<String, String> stemWord(String word);
        - Parameters:   String word to be stemmed
        - Return type:  HashMap with the following keys: "stem", "prefix", "postfix"

    3. stemFile()
        - Purpose:      This function acecpts as input the path to a UTF-8 text file and writes a new file to the same directory with the suffix "_stemmed".
        - Syntax:       void stemFile(String path);
        - Parameters:   String path to text file
        - Return type:  Void

Example:

    UStemmer stmr = new UStemmer();

    stmr.loadRules();

    stmr.stemFile(String path);

    HashMap<String, String> stemmed = stmr.stemWord(String word);

    String stem = stemmed.get("stem");
    String prefix = stemmed.get("prefix");
    String postfix = stemmed.get("postfix");

PS: The API folder I have contains a file UStemmer.JAR and two folders, first one is Data which have Rules.txt file and second folder is UStemmer which have two files, one is UStemmer.class (Unable to open or read) and other is MANIFEST.MF PPS: I cannot use any of the available stemmers because they do not support the language I am working on. (URDU language -Pakistan)

  • Would it be easier to use the Natural Language Toolkit that's written in Python? I love Java, but there are perfectly good tools in Python for this task. – duffymo Apr 22 '18 at 11:55
  • NLTK does not support URDU language stemming. I have mentioned this problem to my question. The only way to stem Urdu language words is this API which has been developed locally in JAVA. I only got its API to use. PS: I am familiar with NLTK and its libraries including different types of stemmers it provides but that's useless for me in this project. – Muhammad Sulaman Toor Apr 22 '18 at 11:57
  • "Unable to run it" is not enough information. What did you try to do with PY4J? Did you set up a gateway service? Where is the code? Why were you unable to run it? – RealSkeptic Apr 22 '18 at 12:31

0 Answers0