I was wondering if it is at all possible to use OpenGrok to index PPT, XLS, DOC etc formats. Would this have to be programmed by myself or is there already a plugin/method of doing this?
Asked
Active
Viewed 280 times
1 Answers
0
There is currently no dedicated analyzer to extract data from these types of documents, however it should be possible to implement one based on the Java libraries listed in Read Microsoft Word Documents into Plain Text (DOC, DOCX) in Java (e.g. Apache POI or Apache Tika)
Feel free to file a new issue on https://github.com/oracle/opengrok/issues

Vlad
- 156
- 12