4

I am parsing one Document that contains RTF Content using Apache tika but it is giving some exception. it is not giving contents of document.

Here is a piece of code :

public String contentEx(File f) throws IOException, SAXException,
        TikaException {

    System.out.println(f.getName());
    InputStream is = new FileInputStream(f);

    Parser ps = new AutoDetectParser();

    BodyContentHandler bch = new BodyContentHandler();
    Metadata metadata = new Metadata();
    ps.parse(is, bch, metadata, new ParseContext());

    return bch.toString();
}

But when i called this method like this :

public static void main(String[] args) throws IOException, SAXException,
        TikaException {

    StanfrdEntityExtr see = new StanfrdEntityExtr()
    File Resum_F = new File("/home/rahul/Documents/resumes/212/swetank.docx");
    String s1 = see.contentEx(Resum_F);
}

it is giving Exception :

Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.rtf.RTFParser@39614c6
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at stranfordParse.StanfrdEntityExtr.contentEx(StanfrdEntityExtr.java:92)
at stranfordParse.StanfrdEntityExtr.main(StanfrdEntityExtr.java:50)

Caused by: java.lang.ArrayIndexOutOfBoundsException: 9
at org.apache.tika.parser.rtf.TextExtractor.processControlWord(TextExtractor.java:872)
at org.apache.tika.parser.rtf.TextExtractor.parseControlWord(TextExtractor.java:566)
at org.apache.tika.parser.rtf.TextExtractor.parseControlToken(TextExtractor.java:492)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:459)
at org.apache.tika.parser.rtf.TextExtractor.extract(TextExtractor.java:448)
at org.apache.tika.parser.rtf.RTFParser.parse(RTFParser.java:56)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
... 4 more

How to solve this Exception ? How to Correctly print content of this document using apache Tika? I found some solution but they are not working.

Give me Some Idea! Any help will be greatly appreciated!

Rahul Kulhari
  • 1,115
  • 1
  • 15
  • 44
  • What version of Apache Tika are you working with? And if it isn't the latest, have you tried upgrading? – Gagravarr Aug 02 '13 at 11:46
  • i am working with **Tika 1.4**. It is the latest version available online. – Rahul Kulhari Aug 02 '13 at 12:11
  • 2
    Looks like you'll need to [open a new issue in the Apache Tika bug tracker](https://issues.apache.org/jira/browse/TIKA), list the stacktrace, and upload a file that triggers the problem so the project can investigate – Gagravarr Aug 02 '13 at 16:45

0 Answers0