0

I have a lot of text message, I run below lines of codes for them.

// tokenize term

TokenStream tokenStream = new ClassicTokenizer(LUCENE_VERSION, new StringReader(term));

// stemmize

tokenStream = new PorterStemFilter(tokenStream);

SOMETIMES i get below error and sometimes no:

# A fatal error has been detected by the Java Runtime Environment:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000025f8360, pid=1688, tid=7492
#
# JRE version: 7.0-b147
# Java VM: Java HotSpot(TM) 64-Bit Server VM (21.0-b17 mixed mode windows-amd64     compressed oops)
# Problematic frame:
# J  org.apache.lucene.analysis.PorterStemmer.stem(I)Z
#
# Failed to write core dump. Minidumps are not enabled by default on client versions of   Windows
#

what should I do?

Dave Newton
  • 158,873
  • 26
  • 254
  • 302
  • Have you tried using one of the analyzers like EnglishAnalyzer - http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/en/EnglishAnalyzer.html which will stem and tokenize it for you? – nbz May 02 '14 at 11:11
  • I have this line before above codes: tokenStream = new StopFilter(LUCENE_VERSION, tokenStream, EnglishAnalyzer.getDefaultStopSet()); but when I print the terms, they are not stemmized! so I used above codes for stemmizing. – user3582044 May 02 '14 at 11:47

1 Answers1

0

Upgrade your JVM. Its well documented on the lucene website that you cannot use java 1.7.0, because of a bug in the Oracle jvm.

Robert Muir
  • 3,185
  • 21
  • 17