I am using solr dataimporthandler tika for doing a search in rich documents such as word, pdf documents. Whenever there is a new file added or any file being changed I have to do a full import to include the changes in the search. As the number of documents is very high, I need an option to re-index only the newly added or modified (similar to delta-import). I know delta-import cannot be used with tika-entity processor and neither clean=false attribute working for my scenario. Is there anyways ways I can achieve this. Thanks for the response in advance.
Asked
Active
Viewed 171 times
0
-
How are you telling SOLR about your new or changed files? Only what you've described doesn't sound at all correct – Gagravarr Oct 27 '13 at 18:33
-
I can use newerThan parameter (within FileListEntityProcessor) to find which all files have been modified. My question is similar to delta-import can I use something here to have an incremental updates rather than having full-import everytime. – Susha Surendran Oct 27 '13 at 19:02