5

I'm using the Java library Tika by Apache (tika-core ver. 1.10).

Exists a org.apache.tika.detect.Detector for CSV files? The MIME type should be text/csv, but I cannot find anything like that.

I would like to use the nice detect method

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
mat_boy
  • 12,998
  • 22
  • 72
  • 116
  • 1
    The main `MimeTypes` detector should cover you for that. What happens if you just try with `DefaultDetetor` or `TikaConfig.getDefaultConfig().getDetector()`? – Gagravarr Aug 21 '15 at 12:16

1 Answers1

6

Currently (v1.10) tika-mimetypes.xml defines text/csv like this:

<mime-type type="text/csv">
  <glob pattern="*.csv"/>
  <sub-class-of type="text/plain"/>
</mime-type>

This means that Apache Tika detects only by filename. If you use Tika#detect(File) Tika will add filename (under Metadata.RESOURCE_NAME_KEY key) to Metadata object passed to detector. There's similar behavior for URLs.

If you want to inject filename you can use something like:

new Tika().detect(is, fileName)

If you want some heuristics, based on content, feel free to check and file a ticket in Tika's JIRA.