I have two email testfiles:
- A file that has been created by using "save as" in Mac Mail (this creates a .txt file)
- A file that has been created by dragging an email from Mac Mail to the Desktop (this creates an .eml file)
If I feed the files with
curl -T filename http://localhost:9998/detect/stream
I get the response "message/rfc822" for both files.
If I run
curl -T filename http://localhost:9998/meta
I get the metadata, but in the case of (1) I do not get the date extracted, while in case (2) I do.
I understand, of course, that the .eml file includes the full raw header, while the .txt file only includes a very abbreviated header. However, even the abbreviated header does include a "Date" field, and so I think Tika should extract it. Is this a bug or intentional? In the latter case, is there anything I could do to get the Tika to extract the date in case (1)?
I am running Tika-server 1.14.