0

I know that Apache Tika is a text extractor. It can extract text from doc, pdf, ppt and lots of other file formats. Now I need this function in ios, so I want to know is there any alternative to Apache Tika for ios?

If there is no such library for ios, you can tell me tools that can extract specified file format.

Thank you in advance.

PeeHaa
  • 71,436
  • 58
  • 190
  • 262
jjyao
  • 315
  • 5
  • 16

1 Answers1

2

libopc for extracting text from docx, xlsx, pptx.

Antiword for older MS formats.

You can extract strings from a PDF using CoreGraphics also, and using PDFiPhone too.

If you're also looking for extracting text from a HTML document, have a look at NSXMLParser.

  • Thank you for your answer. It is very useful. I also want to know how can I extract text from iwork files(pages, keynote, number). Can you give me some hints? – jjyao Sep 06 '12 at 13:21