ForkParser
is a new Tika parser that was introduced in Tika version 0.9, located in org.apache.tika.fork
. The new parser forks off a new jvm process to analyze the passed file stream. I figured this may be a good way to constrain how much memory I'm willing to devote to Tika's metadata extraction process. However, the Metadata
object is not being populated with the appropriate metadata properties like it would when using an AutoDetectParser
. Tests have shown that the BodyContentHandler
object is not null
.
Why is the Metadata
object not being populated with anything (except the manually added RESOURCE_NAME_KEY
)?
public static Metadata getMetadata(File f) {
Metadata metadata = new Metadata();
try {
FileInputStream fis = new FileInputStream(f);
BodyContentHandler contentHandler = new BodyContentHandler(-1);
ParseContext context = new ParseContext();
ForkParser parser = new ForkParser();
parser.setJavaCommand("/usr/local/java6/bin/java -Xmx64m");
metadata.set(Metadata.RESOURCE_NAME_KEY, f.getName());
parser.parse(fis, contentHandler, metadata, context);
fis.close();
String contentType = metadata.get(Metadata.CONTENT_TYPE);
logger.error("contentHandler: " + contentHandler.toString());
logger.error("metadata: " + metadata.toString());
return metadata;
} catch (Throwable e) {
logger.error("Exception while analyzing file\n" +
"CAUTION: metadata may still have useful content in it!\n" +
"Exception: " + e, e);
return metadata;
}
}