3

I have seen the 1.0 release of Apache Tika, which ease a lot metadata extraction in Java, and I'm wondering if it can be used in Android.

parser_failed
  • 697
  • 9
  • 20

3 Answers3

2

I'd suspect you should be fine to port the core of Tika to Android. However, you're likely to have issues with a lot of the dependencies of Tika, so many of the parsers won't work

For example, one of the dependencies of Apache Tika is Apache POI. People have tried to compile POI for Android, but have hit issues with the method limit that Android imposes. Here's one discussion on this on the POI lists, and here's another.

You're likely to hit similar issues with other of the dependencies of Tika too. So, I'd expect getting the core in won't be too bad, but you'll have to cut out some of the parsers to fit within the Android limitations.

Gagravarr
  • 47,320
  • 10
  • 111
  • 156
  • You are right. I successfully ported the core but i'm getting many issues with the parsers (method limit, dependencies, size, etc.). I guess the best thing to do is to only activate the needed parsers or reimplement them. – parser_failed Nov 23 '11 at 10:44
  • @parser_failed: How did you port core? I've been trying to do it but no luck so far: http://stackoverflow.com/questions/10420896/trying-to-port-tika-1-0-to-android-in-eclipse-error-messages-refercing-pom-xml Thx – I Z May 03 '12 at 15:01
1

Yes it is, however, you should probably extract only the parsers you need since it is a fairly large library to include in a mobile application. My project uses the MP3, Flac, OGG, and Vorbis parsers to retrieve metadata from audio files. Here is a link to the stripped down JAR file if you are interested:

http://servestream.svn.sourceforge.net/viewvc/servestream/trunk/lib/tika-app-1.0.jar?view=log

William Seemann
  • 3,440
  • 10
  • 44
  • 78
1

I'm working on getting part of it to work for api 7 (not quite finished getting it to build). If you're working in api 8+, then you can ignore this, but api 7 doesn't have javax.xml.namespace implemented. I found an independent implementation here and after importing that, a lot of problems disappeared. I'll let you know if I run into any problems as a result.

bibismcbryde
  • 369
  • 1
  • 5
  • 17