0

I am trying to detecting Word documents but it fails for some Word documents. I can open the problematic documents just fine in Word and extract them since they are basically just zip files.

I have also tried adding msooxml to /etc/magic on my Ubuntu 18.04 server and restart nginx & PHP FPM with no luck. I use PHP 7.4.4

  • Is there a way to check if msooxml are used?
  • Do you any suggestions to make it work?

Here is my code:

$file = 'broken.docx';

$mime = mime_content_type($file);

echo '<p>'.$mime.'</p>';

$finfo = new finfo(FILEINFO_MIME, 'msooxml');

echo '<p>'.$finfo->file($file).'</p>';

Result:

application/octet-stream

application/octet-stream; charset=binary

Expected result:

application/vnd.openxmlformats-officedocument.wordprocessingml.document

application/vnd.openxmlformats-officedocument.wordprocessingml.document; charset=binary

msooxml file taken from: https://github.com/file/file/blob/master/magic/Magdir/msooxml

Cudos
  • 5,733
  • 11
  • 50
  • 77
  • Does this answer your question? [DOCX File type in PHP finfo\_file is application/zip](https://stackoverflow.com/questions/6595183/docx-file-type-in-php-finfo-file-is-application-zip) – Nico Haase Apr 22 '20 at 15:04
  • I doesn't really answer my question and it seems the answers in your link is a bit unclear. – Cudos Apr 24 '20 at 06:49

0 Answers0