-3

I've got a bunch of .dcm-files (dice-files) where I would like to extract the header and save the information there in a CSV file.

As you can see in the following picture, I've got a problem with the delimiters:

part of dicom-header

For example when looking at the second line in the picture: I'd like to split it like this:

0002 | 0000 | File Meta Information Group Length | UL | 174

But as you can see, I've not only multiple delimiters but also sometimes ' ' is one and sometimes not. Also the length of the 3rd column varies, so sometimes there is only a shorter text there, e.g. Image Type further down in the picture.

Does anyone have a clever idea, how to write it in a CSV file?
I use pydicom to read and display the files in my IDE. I'd be very thankful for any advice :)

Amit Joshi
  • 15,448
  • 21
  • 77
  • 141
T-Man
  • 47
  • 9
  • 1
    Please do not post images of text, and especially links to images of text. Links rot, and text should be searchable and cut-n-paste-able. – Mark Tolonen Mar 15 '21 at 15:48
  • Could you add a link to an example file (you could use something like pastebin.com) – Martin Evans Mar 15 '21 at 16:10
  • Sorry for this comment, but it has to be stated that I generally think that the idea to convert DICOM into CSV is not very smart. The reason is that CSV "is a table" while DICOM "is a tree". For that reason, CSV is generally inappropriate for converting DICOM files. You output looks very conversion unfriendly in general. You may want to give dcmdump from the DCMTK a try which produces a more uniform output. – Markus Sabin Mar 15 '21 at 16:38
  • Mark: I'll remember it next time, thanks for the remark. Martin: I've uploaded an example file here: https://easyupload.io/sm6idj but darcymason has probably solved my problem :) @kritzel_sw please don't be sorry. I appreciate your comment! I'm quite new to dcm-files. But further down the road I plan on implementing the files in a neo4j database which is why I need the csv, although it kinda hurts the tree structure. I'll have a look at dcmdump, thank u very much :) – T-Man Mar 16 '21 at 11:55

1 Answers1

4

I would suggest going back to the data elements themselves and working from there, rather than from a string output (which is really meant for exploring in interactive sessions)

The following code should work for a dataset with no Sequences, would need some modification to work with sequences:

import csv
import pydicom
from pydicom.data import get_testdata_file

filename = get_testdata_file("CT_small.dcm")  # substute your own filename here
ds =  pydicom.dcmread(filename)

with open('my.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow("Group Elem Description VR value".split())
    for elem in ds:
        writer.writerow([
            f"{elem.tag.group:04X}", f"{elem.tag.element:04X}",
            elem.description(), elem.VR, str(elem.value)
        ])

It may also require a bit of change to make the elem.value part look how you want it, or you may want to set the CSV writer to use quotes around items, etc.

Output looks like:

Group,Elem,Description,VR,value
0008,0005,Specific Character Set,CS,ISO_IR 100
0008,0008,Image Type,CS,"['ORIGINAL', 'PRIMARY', 'AXIAL']"
0008,0012,Instance Creation Date,DA,20040119
0008,0013,Instance Creation Time,TM,072731
...
darcymason
  • 1,223
  • 6
  • 9