0

I'm using the massindexer to index my domain model for a project I'm working on; my domain model includes file bytes stored in the database. I've properly annotated my domain model with the TikaBridge annotation for the collections of files inside my domain objects.

Most of the files I have access to (200+, various formats) in my database index fine but occasionally a file fails to parse for whatever reason. This seems to cause the indexer to stop processing it's entire current batch of domain objects. I opened an issue about it at https://hibernate.atlassian.net/browse/HSEARCH-1354

According to the documentation you can create a custom error handler to handle this type of problem; http://docs.jboss.org/hibernate/search/4.3/reference/en-US/html_single/#d0e2582

I cannot seem to be able to tell hibernate search to just ignore the parse error and continue indexing.

Can some point me in the right direction on this one and let me know how to create a custom ErrorHandler that ignores Tika document parsing errors?

Sanne
  • 6,027
  • 19
  • 34
user1170235
  • 189
  • 3
  • 9

1 Answers1

1

I was unable to make the custom ErrorHandler solution work so I ended up copying and pasting the org.hibernate.search.bridge.builtin.TikaBridge into my codebase and modifying it to log parse errors but keep moving.

I ended up using the following annotations to bring it all together.

@Field
@FieldBridge(impl=com.my.project.CustomTikaBridge.class)
private byte[] bytes;
user1170235
  • 189
  • 3
  • 9