1

This problem is really driving me crazy,

TO ANSWER MOST OF WHAT PEOPLE THINK: YES I ADDED snowball.jar TO THE CLASSPATH

I have a simple main class that supposed to stem the word "going" to "go":

import weka.core.stemmers.SnowballStemmer;

public class StemmerTest {
    public static void main(String[] args) {
        SnowballStemmer stemmer = new SnowballStemmer();
        stemmer.setStemmer("english");
        System.out.println(stemmer.stem("going"));
    }
}

First when I run it in eclipse it works and I get the following output:

Refreshing GOE props...
---Registering Weka Editors---
Trying to add database driver (JDBC): RmiJdbc.RJDriver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): jdbc.idbDriver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): org.gjt.mm.mysql.Driver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): com.mckoi.JDBCDriver - Warning, not in CLASSPATH?
Trying to add database driver (JDBC): org.hsqldb.jdbcDriver - Warning, not in CLASSPATH?
[KnowledgeFlow] Loading properties and plugins...
[KnowledgeFlow] Initializing KF...
go

However when I export it as a runnable jar from eclipse "stem.jar" and execute it in the terminal "java -jar stem.jar" it doesn't work and I get the following output:

Refreshing GOE props...
[KnowledgeFlow] Loading properties and plugins...
[KnowledgeFlow] Initializing KF...
Stemmer 'porter' unknown!
Stemmer 'english' unknown!
going

I have no idea why the snowball.jar is not recognized in the exported jar ... Although both weka.jar and snowball.jar are included in the exported jar. Here is the stem.jar file structure:

stem.jar
       |
       |---META-INF
       |---org
       |---StemmerTest.class
       |---snowball.jar
       |---weka.jar

I would appreciate any help with the problem

EDIT 1: Generated ANT Script:

<project default="create_run_jar" name="Create Runnable Jar for Project StemmerTest with Jar-in-Jar Loader">
<!--this file was created by Eclipse Runnable JAR Export Wizard-->
<!--ANT 1.7 is required                                        -->
<target name="create_run_jar">
    <jar destfile="stem.jar">
        <manifest>
            <attribute name="Main-Class" value="org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader"/>
            <attribute name="Rsrc-Main-Class" value="StemmerTest"/>
            <attribute name="Class-Path" value="."/>
            <attribute name="Rsrc-Class-Path" value="./ snowball-2012.jar weka.jar snowball.jar"/>
        </manifest>
        <zipfileset src="jar-in-jar-loader.zip"/>
        <zipfileset dir="resources/lib" includes="snowball-2012.jar"/>
        <fileset dir="bin"/>
        <zipfileset dir="." includes="weka.jar"/>
        <zipfileset dir="." includes="snowball.jar"/>
    </jar>
</target>

EDIT 2:

Here is the content of MANIFEST.MF as requested.

Manifest-Version: 1.0
Ant-Version: Apache Ant 1.7.1
Created-By: 23.25-b01 (Oracle Corporation)
Main-Class: org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader
Rsrc-Main-Class: StemmerTest
Rsrc-Class-Path: ./ weka.jar snowball.jar
Class-Path: .

Thanks in Advance, TeFa

TeFa
  • 974
  • 4
  • 15
  • 37
  • Well it's not complaining about weka and snowball not being in the classpath, because they are. It is complaining about a JDBC driver not being in the classpath. Maybe weka and snowball have config files which you can edit, and remove the database dependency. I don't know anything about weka or snowball. – NickJ Jun 21 '13 at 15:06
  • Running a jar file that also contains dependent jars within it is non-trivial. It's not just a matter of exporting the code in to a jar file. The have to craft a fairly hideous MANIFEST.MF file and include it. This link ( http://docs.oracle.com/javase/1.5.0/docs/guide/jar/jar.html#Name-Value pairs and Sections ) is a bit old but points in the right direction. If you are using Maven, then that has good tooling to take away some of the pain – DaveH Jun 21 '13 at 15:08
  • When the output is "Stemmer 'porter' unknown!" it means that weka did't find the snowball package in the classpath. [Wiki](http://weka.wikispaces.com/The+snowball+stemmers+don%27t+work%2C+what+am+I+doing+wrong%3F) – TeFa Jun 21 '13 at 15:09
  • @DaveHowes I am using the ant script generated by eclipse ... I included it in my question now ... I am not an expert in ant or maven at all thats why I am using the generated ant script by eclipse ... If have any idea how to edit the script to make it work I would really appreciate it :) ... – TeFa Jun 21 '13 at 17:10
  • can you post the contents of your manifest.mf file? – DaveH Jun 21 '13 at 17:42
  • Don't like the look of that Rsrc-Class-Path attribute : see if this question helps : http://stackoverflow.com/questions/858766/generate-manifest-class-path-from-classpath-in-ant – DaveH Jun 21 '13 at 18:05

4 Answers4

2

Although it is not clear for me, I managed to solve this annoying problem (after ~10 hours -.-) by doing the following:-

  • Using "zipgroupfileset" instead of "fileset" for "snowball.jar" to flatten the content in the generated jar file.

  • Exclude "snowball.jar" from the classpath (Since its already included in the generated jar file).

For some UNKNOWN reason, the snowball wrapper in weka.jar couldn't find snowball.jar until its flattened (extracted).

Here is the ant script that works for me:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<project default="jar">
    <path id="dep.runtime">
        <fileset dir="./libs">
            <include name="**/*.jar" />
            <exclude name="**/snowball.jar"/>
        </fileset>
    </path>

    <manifestclasspath property="manifest_cp" jarfile="stem.jar">
        <classpath refid="dep.runtime" />
    </manifestclasspath>

    <target name="jar">
        <jar destfile="stem.jar">
            <manifest>
                <attribute name="Main-Class" value="StemmerTest"/>
                <attribute name="Class-Path" value="${manifest_cp}"/>
            </manifest>
            <zipgroupfileset dir="./libs" includes="snowball.jar"/>
            <fileset dir="bin"/>
        </jar>
    </target>
</project>

Hope this helps if someone is using snowball stemmer.

TeFa
  • 974
  • 4
  • 15
  • 37
  • Hi, I am facing the same issue. I tried using the same thing mentioned above. But still getting issue. Do you have any idea how can i solve it. – Neha Nov 29 '16 at 13:46
0

I did it after 1hour of tests, as there's nothing on that matter at the wiki. The solution goes like this:

SnowballStemmer stemmer = new SnowballStemmer();
stemmer.setStemmer("English");
StringToWordVector STWfilter = new StringToWordVector(1000);
STWfilter.setUseStoplist(true);
STWfilter.setIDFTransform(true);
STWfilter.setTFTransform(true);
STWfilter.setNormalizeDocLength(new SelectedTag(StringToWordVector.FILTER_NORMALIZE_ALL, StringToWordVector.TAGS_FILTER));
STWfilter.setOutputWordCounts(true);
STWfilter.setStemmer(stemmer);
STWfilter.setInputFormat(train);

I posted the whole example so that it saves you the 1h I spent on doing this the right way.

Alex Arvanitidis
  • 4,403
  • 6
  • 26
  • 36
0

I had the same problem with Snowball using multithreading. I solved it like this:

SnowballStemmer st = new SnowballStemmer();
do{
            //wait until the German stemmer is initialized
}while(!st.stemmerTipText().contains("german"));
st.setStemmer("german");
filter.setStemmer(st);

The error message "Stemmer 'porter' unknown!" will stay, but it will set i.e. the German stemmer correctly.

Felix
  • 161
  • 1
  • 10
0

I have followed this method and it has worked. My IDE is NetBeans. I have downloaded the jar from here. It is the second option under title of Snowball stemmers. I have added it to my class path and used following code to add stemmer into filter.

SnowballStemmer stemmer = new SnowballStemmer();
stemmer.setStemmer("english");
StringToWordVector filter = new StringToWordVector();
filter.setStemmer(stemmer);
Chamath Sajeewa
  • 300
  • 1
  • 14