-1

I'm interested in validating medical image files of certain formats. When I say validate I mean make sure they are indeed files of that kind and not, say, some malware disguised as a file. So for example if someone has a file virus.exe and they changed it into virus.dcm I'd like to be able to tell it's not a legit .dcm file

I've seen an answer for validating dicom files that says I should look at offset 0x80 for a certain label. But I'm not sure if it's possible for someone to insert that label into virus.dcm.

The file types I want to validate are DICOM files (.dcm, .PAR/.REC), NIFTI files (.nii, .nii.gz), ANALYZE files (.img/.hdr), and .zip files

I'm not looking for code per se (though that would be nice), but I'd like to know what's the best way to distinguish legitimate files of these types from malware files that have been changed to look like these files.

Community
  • 1
  • 1
WhiteTiger
  • 758
  • 1
  • 7
  • 21

1 Answers1

3

Validating a dicom file is quite difficult: the problem is that the DICOM standard allows for the first 128 bytes of the file to contain absolutely anything (including executable code). After the first 128 bytes there is the DICM signature (offset 0x80).

So, even if you manage to open the DICOM file and see a valid image and tags in a DICOM viewer, the file could still contain executable code in the first 128 bytes (it would probably contain pointers to some portions at the end of the DICOM data).

I suggest to mark all the DICOM files as non-executable using chmod on Linux or this suggestion on Windows

Community
  • 1
  • 1
Paolo Brandoli
  • 4,681
  • 26
  • 38
  • I'd like to "accept" this, but will it affect other answering my question regarding the other file types? – WhiteTiger Sep 02 '15 at 21:55
  • Also, if I set the file to non executable via `chmod -x`, does that mean I don't have to worry about the file contents being malicious? – WhiteTiger Sep 02 '15 at 22:02
  • @WhiteTiger the file could still contain malicious parts: it could exploit a bug in the reader and cause execution of code. For instance, it could embed so many sequence data into each other that cause a stack overflow in the reader, unless the reader checks for the validity of the sequences or does not use recursion to read them or checks for the depth of the recursion. Or it could use a specially crafted jpeg file that cause the reader to crash and executed unauthorized code. – Paolo Brandoli Sep 03 '15 at 08:00
  • think user perspective. he's not gonna do all these things like give permissions and upload etc. He always browse and upload files. So, is there any way to validate dicom files without extension .dcm and file type application/dicom? – Maulik Nov 02 '18 at 05:57