1

I need to index a long list of documents (mostly ms office formats, pdf) and perform full text search and support versioning.

I read about lucene but it seems far to be a complete solution, does anyone know a commercial complete indexer?

Michele
  • 1,468
  • 3
  • 24
  • 54

2 Answers2

1

For versioning use git or mercurial.

For the "full text search" I found some links:

http://zez.org/article/view/83/

http://www.phpriot.com/articles/zend-search-lucene

PiTheNumber
  • 22,828
  • 17
  • 107
  • 180
  • thanks for reaply. git / mercurial are too big to be easily implementes. Lucene php api dosen't work great . – Michele Mar 13 '12 at 10:01
  • You can generate [md5 hashs](http://www.php.net/manual/en/function.md5-file.php) of your files to track changes and backup the files manually. – PiTheNumber Mar 13 '12 at 10:03
0

You can try Recognition Server, it's high-volume OCR, document conversion and indexing software. http://www.abbyy.com/recognition_server/

This software creates searchable digital archives. You can download trial version and try it for free