1

I have a SOAP-WebService providing a method where the caller is able to upload a PDF, JPG, PNG or BMP file. For correct processing I need to get the MIME-Type out of the DataHandler. I tried to get the MIME-Type with Apache Tika:

Tika tika = new Tika();
InputStream stream = dataHandler.getInputStream();
String mimeType = tika.detect(stream);

Now my problem:

Most times Tika detects the correct MIME-Types but in a few cases of uploading JPGs it detects plain/text instead of image/jpeg. How can I solve this?

Thank you in advance!

VenoxX
  • 35
  • 1
  • 7
  • What version of Apapche Tika are you using? What happens if you upgrade? Oh, and do you have the filename to hand? Tika can guess better when given the filename too – Gagravarr Aug 24 '16 at 09:46
  • I have the current version 1.13. Unfortunately I don't have any filename, thats one reason why I need a MIME-Type ;-) – VenoxX Aug 24 '16 at 11:46
  • Valid jpeg files shouldn't be detected as text. Assuming you get the same issue on a recent nightly build, could you open a new Tika bug and upload a file that shows the problem? – Gagravarr Aug 24 '16 at 12:24

1 Answers1

1

I don't know what's wrong with Tika but as alternative you could try MimeUtil which does pretty much the same thing and does it well with more flexibility as you can configure it easily.

// Define the mime type detector to use, here it will be MagicMimeMimeDetector
// As you intend to detect from a Stream
// To be done only once in a static block of your class for example    
MimeUtil.registerMimeDetector("eu.medsea.mimeutil.detector.MagicMimeMimeDetector");
...
// Get the collection of matching mime types
Collection<?> mimeTypes = MimeUtil.getMimeTypes(stream);

More details about the class MimeUtil here.

Nicolas Filotto
  • 43,537
  • 11
  • 94
  • 122