4

Can anyone please suggest me a method by which a chm file can be indexed in such as pdfbox for pdf.

Biswanath Chowdhury
  • 257
  • 1
  • 3
  • 15
  • Apache Tika is more common to use with Lucene, I just didn't know about their support for CHM. So, accept deathy's answer, please. – ffriend Jun 13 '11 at 14:13

2 Answers2

3

If you're talking about Microsoft Compiled HTML Help files, you can just extract text from them with JChm and then index it in a normal way.

ffriend
  • 27,562
  • 13
  • 91
  • 132
3

If you have also other document formats which you need to index, you might find a better and more general solution in Apache Tika

They just added a CHM Parser recently (for reference: Support of CHM Format) and it will be in the next version.

Cristian Vat
  • 1,602
  • 17
  • 18