I got a Java API for words stemming but I am unable to run it. I am working on an NLP project in PYTHON 3.x where I read all the text from documents and converted it into words. I want to use this Java API for stemming to stem my words and then process further. I was exploring about running Java API directly into Python program using different libraries and I read a little bit about PY4J but unable to run it. Can anyone please guide me how to use this API in Python or if this is not possible then how to use it in ECLIPSE.
Stemmer API Instructions:
Description: Word Stemmer API is a Java application that provides an interface to extract the stems, prefixes, and postfixes of words.
Setup: Copy the Data folder into your project directory and add the provided JAR file to your project.
Usage:
1. loadRules()
- Purpose: This function loads the stemming rules from the ./Data/Rules.txt into the program.
- Syntax: void loadRules();
- Parameters: None
- Return type: Void
2. stemWord()
- Purpose: This function accepts as input a single word and returns a HashMap containing its stem, prefix, and postfix.
- Syntax: HashMap<String, String> stemWord(String word);
- Parameters: String word to be stemmed
- Return type: HashMap with the following keys: "stem", "prefix", "postfix"
3. stemFile()
- Purpose: This function acecpts as input the path to a UTF-8 text file and writes a new file to the same directory with the suffix "_stemmed".
- Syntax: void stemFile(String path);
- Parameters: String path to text file
- Return type: Void
Example:
UStemmer stmr = new UStemmer();
stmr.loadRules();
stmr.stemFile(String path);
HashMap<String, String> stemmed = stmr.stemWord(String word);
String stem = stemmed.get("stem");
String prefix = stemmed.get("prefix");
String postfix = stemmed.get("postfix");
PS: The API folder I have contains a file UStemmer.JAR and two folders, first one is Data which have Rules.txt file and second folder is UStemmer which have two files, one is UStemmer.class (Unable to open or read) and other is MANIFEST.MF PPS: I cannot use any of the available stemmers because they do not support the language I am working on. (URDU language -Pakistan)